Sensorlogger
Statistics & Logbooks
Any statistical evaluation of the collected data is bound to a logbook column. If you want to get the results from a statistical operation, you have to configure a logbook column. It doesn’t matter if the logbook is only virtual or imaginary and you are only interested in broadcasting your statistics via MQTT, or if the logbook will actually be written as a real file.
As a file, logbooks are tab-separated text files where the collected values from the most recent measurement cycle are statistically summarized and recorded. You can set up an arbitrary number of logbooks, each with their own variety of columns for different statistical operations, sensors and cycle times. A very simple example of the content of a logbook file is shown below. It collects the 30 minute mean values for two different sensors.
# Time Temp [°C] Humidity [%RH] 2021-01-31 15:00:00 2.79615 65.0923 2021-01-31 15:30:00 1.182 70.46 2021-01-31 16:00:00 0.90467 75.36 2021-01-31 16:30:00 0.56 75.8733 2021-01-31 17:00:00 0.0286667 80.5067 2021-01-31 17:30:00 -0.499333 84.76 2021-01-31 18:00:00 -0.926 89.06 2021-01-31 18:30:00 -1.14067 89.1933
Logbook configurations
The logbooks key in the configuration file is always a JSON array that may contain any number of logbook configurations.
"logbooks": [ ... { "filename": "/home/user/weather/weatherlog.txt", "cycle_time": {"value": 15, "unit": "min"}, "max_entries": 96, "missing_data": "-", "columns": [ ... ] }, ... ]
"filename": Specifies the path and filename where the logbook shall be saved. If set to null, the logbook will not be saved as a real file, but statistics will still be published via MQTT or sent to the HomeMatic CCU, if intended.
Standard value: null
"cycle_time": Duration of a measurement cycle. Please note that the cycle time should be well above the rest period (polling intervals) of the individual sensors in order to accumulate some values for the statistics to make sense. The numerical part for this parameter is set under "value", its unit under "unit". The following units are allowed: "ms", "s", "min", "h", "d".
Standard value: 15 min
"max_entries": Maximum number of logbook entries (i.e. lines). If the number is exceeded, the oldest entries will be deleted.
Standard value: 30
"missing_data": String that is used for entries where data is missing. This can happen if a sensor cannot be reached during the whole measurement cycle or if the rest period of a sensor is longer than the measurement cycle.
Standard value: "-"
"columns": JSON array where all the columns for this logbook are defined. Please refer to the following section on how to set up logbook columns.
Logbook columns
The "columns" key (as a child node of a logbooks object) always hosts a JSON array that can contain the definitions of an arbitrary number of columns. The very first column of any logbook is always a time stamp that shows when the statistics has been calculated. All the remaining columns are configured here.
"columns": [ ... { "title": "Temperature", "unit": "°C", "sensor_id": "Weather/Temperature", "mqtt_publish": "House/Weather/Temperature/mean", "homematic_publish": "34572", "operation": "mean", "evaluation_period": {"value": 1, "unit": "h"}, "confidence_absolute": 10.0, }, { "title": "Humidity", "unit": "%rel", "sensor_id": "Weather/Humidity", "mqtt_publish": "House/Weather/Humidity/median", "homematic_publish": "37856", "operation": "median", "confidence_sigma": 3.0 }, { "title": "Wind", "unit": "Hz", "sensor_id": "Weather/Wind", "mqtt_publish": "House/Weather/Wind/frequency", "homematic_publish": "22156", "operation": "freq", "count_factor": 0.5 }, ... ]
"title": The column’s title, as displayed in the first (commented) header line of the logbook file.
"unit": The column’s unit for all of its values, as displayed in square brackets next to the title in the first header line of the logbook file.
Standard value: null
"sensor_id": A unique string that identifies the sensor that shall be evaluated here. The ID was specified in previous sections when setting up the individual sensors.
"mqtt_publish": Topic that is used to publish the result of the statistical operation via MQTT.
Standard value: null
"homematic_publish": ISE ID of the HomeMatic system variable that should be set to the result of the statistical operation.
Standard value: null
"operation": Statistical operation that shall be applied to the values of the most recent measurement cycle in order to calculate the column value. The following operations are allowed:
- "mean" — Arithmetic mean
- "median" — Median value
- "max" — Maximum
- "min" — Minimum
- "sum" — Sum
- "stddev" — Standard deviation (RMSD) around the arithmetic mean
- "stddev_mean" — Standard deviation (RMSD) around the arithmetic mean
- "stddev_median" — Standard deviation (RMSD) around the median value
- "count" — Number of measurements, or number of events in case of an externally triggered sensor.
- "freq" — Frequency of the incoming measurements, in 1/s.
- "freq_min" — Minimum overall frequency that has occurred during the last measurement cycle. This is the inverse of the maximum time between two incoming events or measurements. In 1/s.
- "freq_max" — Maximum overall frequency that has occurred during the last measurement cycle. This is the inverse of the minimum time between two incoming events or measurements. In 1/s. Warning: If you want to use this for a pulse counter, be aware that events can only be time-tagged once they arrive at the Sensorlogger. The actual point in time of the pulse creation is not known. Latencies, especially when receiving values over the network or even via USB, can lead to event showers and have strong effects on the maximum and minimum frequency.
Standard value: "mean"
"evaluation_period": Length of the time period that shall be evaluated. If not defined, this will be the logbook’s cycle time, but you can set any other value here. For example, this parameter can be used to calculate a moving average. The numerical part for this parameter is set under "value", its unit under "unit". The following units are allowed: "ms", "s", "min", "h", "d".
Standard value: null
"count_factor": This factor can be used to weight the number of events of a pulse counter. It only affects the counter operations "count", "freq", "freq_min" and "freq_max". In the example above, the wind sensor triggers two pulses per rotation, so we scale the pulse frequency by a factor of 0.5 to get the rotation frequency.
Standard value: 1
"confidence_absolute": This parameter can be used to reduce the influence of outliers on the statistical result, for example when calculating the mean value. Any measurements that deviate by more than a given absolute value a from the median value μ are not considered during the statistical analysis. This means that we define a confidence interval μ±a which contains all values that are relevant for the statistical operation for this column.
This outlier reduction technique will be applied before the following operations: "mean", "median", "max", "min", "sum", "stddev", "stddev_mean" and "stddev_median". If the parameter is omitted or set to 0 or null, all of the collected values are considered and outlier reduction is turned off.
Standard value: null
"confidence_sigma": This parameter can be used to reduce the influence of outliers on the statistical result, for example when calculating the mean value. It specifies the size of the confidence interval around the median value μ in units of the standard deviation σ (RMSD around the median). This means that with this parameter, a factor f can be set such that all values in the interval μ±(f·σ) are considered for the statistical operation.
Note that outliers can significantly increase the standard deviation σ, depending on the total number of measurements. This means that the factor f should not be chosen too high to avoid including outliers in the confidence interval. But f should also not be too low in order to avoid elimination of too many values.
In certain cases it can happen that this outlier reduction technique eliminates all values, e.g. when the confidence interval is too narrow. In this case, the only representative for the measurement cycle is the original median value of the collected data points. This would not necessarily have a bad effect on averaging, but it strongly affects sums, minimums, maximums and standard deviations.
To get a better feeling of what is happening here, you can set up a second column with no confidence interval applied, or another column where the actual standard deviation of all measurements is listed to see some typical values for your use case.
This outlier reduction technique will be applied before the following operations: "mean", "median", "max", "min", "sum", "stddev", "stddev_mean" and "stddev_median". If the parameter is omitted or set to 0 or null, all of the collected values are considered and outlier reduction is turned off.
If both methods are defined, this outlier reduction has precedence and the one defined for "confidence_absolute" (previous point) is ignored.
Standard value: null