RM Monitoring v.1.1 now includes CPU & GPU usage: temperatures and power consumption.
For the list of all sensors, please visit Sensors page.
Dependency for Ubuntu: the target system must have nvidia-smi and sensors utilities installed
Shortcut to Configuration file is available from Start Menu -> Comino -> rm-monitor.yml. Open the file in the notepad application.
SerialPort:
Name: auto (Do not edit, this is controller virtual port)
RequestPeriodSecs: 5 (Controller polling time in seconds)
InternalDB:
Enabled: true (change to FALSE if you do not need local data base)
RetentionPeriodDays: 120 (how long the local data base stores data)
HttpServer:
Address: localhost (Address at which the api will be available)
Port: 20000 (Port at which the api will be available)
InfluxDB:
Enabled: true
BucketName: rm-monitor-bucket
ServerURL: http://localhost:8086
AuthToken: <paste here token you've copied from InfluxDB>
OrganizationName: default-org <paste here organization name from InfluxDB>
To use local Grafana interface, click on RM Monitor shortcut in Start Menu -> Comino -> Open RM Monitor or proceed to http://localhost:3000/d/-GhXmAY7k/rm
Panel consists of a number of "health" sensors and indicates with a color if anything happens or requires attention. New sensors:
CPU & GPUs Temperatures as well as Total Energy Consumptions (CPU and GPUs powers) have been added to Air and Coolant inlet & outlet temperatures and calculated coolant system efficiency.
Shows the power consumption of the processor and video cards as a real-time data.
Shows number of rotation per minute for each fan & pump.
OK – everything is OK, no notifications or alarms
Warning – warning, approaching to the critical threshold
Critical –critical error, the controller turns everything off, the start is blocked until the temperature returns to the green zone
Sensor Name | Critical | Warning | OK | Warning | Critical | |
---|---|---|---|---|---|---|
COOLANT IN | T1 | ..1 |
2..3 |
4..57 |
58..59 |
60.. |
COOLANT OUT | T2 | ..1 |
2..3 |
4..57 |
58..59 |
60.. |
AIR IN | T3 | ..1 |
2..3 |
4..34 |
35..37 |
38.. |
AIR OUT | T4 | ..1 |
2..3 |
4..59 |
60..64 |
65.. |
VOLTAGE 12V | V | ..10.7 |
10.8..11.3 |
11.4..12.6 |
12.7..13.2 |
13.3.. |
HUMIDITY | RH | 0..19 |
20..29 |
30..59 |
60..85 |
86..100 |
STM | T0 | ..-20 |
-19..0 |
1..74 |
75..77 |
78.. |
VRM | T5 | ..-20 |
-19..0 |
1..89 |
90..104 |
105.. |
PCB | T6 | ..-20 |
-19..0 |
1..99 |
100..117 |
118.. |
* all temperature degrees in the table are in Celsius
To get sensors data via URL through Rest API proceed to http://localhost:20000/sensors
Check the table with SENSORS for more details
Before connecting to the monitoring system make sure that both your RM machine and the monitoring system are in the same network, i.e. if you’re using the monitoring system in your local network, then you can start connecting. Otherwise, you need to translate the destination IP address of the internal server to public IP address (DNAT).
To connect to cloud InfluxDB paste server url, authentication token and organization name into the configuration file.
If you notice a bug in a sensor report, or an undefined value, please email to monitoring@comino.com. Your suggestions on monitoring improvements are also appreciated. Thanks!