Paul G.
Paul G.

Reputation: 479

gnuplot stats range possible?

I want gnuplot to do the stats function only for a given range of the data.

My data looks like:

24.12.2014-08:00,34,35,44
25.12.2014-08:00,33,35,44
26.12.2014-08:00,32,32,48
27.12.2014-08:00,31,36,41
28.12.2014-08:00,34,35,44

I now have this in my plot script:

...
set datafile separator ","
stats 'out.csv' u 2 prefix "A"
set xdata time
set timefmt "%d.%m.%Y-%H:%M"
set format x "%d.%m"
set xrange["24.12.2014":"28.12.2014"]
set label 1 gprintf("Max = %g", A_max) font "-Bold" at "24.12.2014",A_max-1
...

but this calculates stats for all Dates. But I only want range from 26.12 to 28.12 for the stats calculations and the whole range for my actual chart, because I want to split my chart in different time periods stats.

Upvotes: 4

Views: 3866

Answers (2)

Remo Tomasi
Remo Tomasi

Reputation: 1

This was my situation:

2019-04-16 03:00 11.428
2019-04-16 06:00 13.952
2019-04-16 09:00 17.715
2019-04-16 12:00 18.901
2019-04-16 15:00 18.25 
2019-04-16 18:00 13.735
2019-04-16 21:00 12.05 
2019-04-17 00:00 11.297
2019-04-17 03:00 10.85 
2019-04-17 06:00 13.75 
2019-04-17 09:00 17.55 
2019-04-17 12:00 18.75 
2019-04-17 15:00 17.35 
2019-04-17 18:00 13.35 
2019-04-17 21:00 11.85 
2019-04-18 00:00 11.685
2019-04-18 03:00 11.379
2019-04-18 06:00 13.772
2019-04-18 09:00 17.359
2019-04-18 12:00 19.059
2019-04-18 15:00 18.101
2019-04-18 18:00 13.549
2019-04-18 21:00 12.75 
2019-04-19 00:00 12.622
2019-04-19 03:00 12.55 
2019-04-19 06:00 14.95 
2019-04-19 09:00 18.15 
2019-04-19 12:00 19.15 
2019-04-19 15:00 17.914
2019-04-19 18:00 14.114
2019-04-19 21:00 13.371
2019-04-20 00:00 12.977
2019-04-20 03:00 12.959
2019-04-20 06:00 15.331
2019-04-20 09:00 19.112
2019-04-20 12:00 20.271
2019-04-20 15:00 19.25 
2019-04-20 18:00 14.337
2019-04-20 21:00 12.216
2019-04-21 00:00 11.584
2019-04-21 03:00 10.945
2019-04-21 06:00 15.281
2019-04-21 09:00 18.093
2019-04-21 12:00 18.85 

As Matthew said I used something similar in according to the date format:

set timefmt "%Y-%m-%d %H:%M"
stats [time(0):time(0) + 5*24*60*60] 'out.csv' u (timecolumn(1)):2

time(0) is the starting point and the final point is calculated through adding 86400 secs to the starting point.

In the end, I obtained these stats:

* FILE:
  Records:           40
  Out of range:       4
  Invalid:            0
  Blank:              0
  Data Blocks:        1

* COLUMNS:
  Mean:          1.55562e+09             2.5214
  Std Dev:       124668.6809             2.0668
  Sample StdDev: 126256.8810             2.0931
  Skewness:           0.0000            -0.2736
  Kurtosis:           1.7985             2.3318
  Avg Dev:       108000.0000             1.7471
  Sum:           6.22246e+10           100.8571
  Sum Sq.:       9.67976e+19           425.1651

  Mean Err.:      19711.8492             0.3268
  Std Dev Err.:   13938.3823             0.2311
  Skewness Err.:      0.3873             0.3873
  Kurtosis Err.:      0.7746             0.7746

  Minimum:       1.55541e+09 [ 0]       -1.8791 [ 0]
  Maximum:       1.55583e+09 [39]        6.6000 [38]
  Quartile:      1.55551e+09             1.4092
  Median:        1.55562e+09             2.7873
  Quartile:      1.55572e+09             4.2904

  Linear Model:       y = 4.758e-06 x - 7399
  Slope:              4.758e-06 +- 2.576e-06
  Intercept:          -7399 +- 4008
  Correlation:        r = 0.287
  Sum xy:             1.569e+11

As you can see, in stats data, the date is expressed in seconds from January, 1st 1970. Now I have the possibility to know where are placed max/min and other useful values.

Upvotes: 0

Matthew
Matthew

Reputation: 7590

The stats function does not like time data, but you can force it to work with time data using the various functions for manipulating times. Two methods for doing this are provided.

Method 1

startrange = strptime("%d.%m.%Y","26.12.2014")
endrange = strptime("%d.%m.%Y","29.12.2014")
validdate(x) = (curdate=strptime("%d.%m.%Y-%H:%M",x),curdate>=startrange&&curdate<endrange)
stats 'out.csv' u (validdate(strcol(1))?$2:1/0) prefix "A"

Which produces

* FILE: 
  Records:           3
  Out of range:      0
  Invalid:           2
  Blank:             0
  Data Blocks:       1

* COLUMN: 
  Mean:              32.3333
  Std Dev:            1.2472
  Sample StdDev:      1.5275
  Skewness:           0.3818
  Kurtosis:           1.5000
  Avg Dev:            1.1111
  Sum:               97.0000
  Sum Sq.:         3141.0000

  Mean Err.:          0.7201
  Std Dev Err.:       0.5092
  Skewness Err.:      1.4142
  Kurtosis Err.:      2.8284

  Minimum:           31.0000 [1]
  Maximum:           34.0000 [2]
  Quartile:          31.0000 
  Median:            32.0000 
  Quartile:          34.0000

on your sample data (the first two lines are out of range and the last three are not). Here we force out of range values to be invalid, thus we show 0 out of range.

The way that this works is that we use the strptime function which converts a date into an internal representation (in gnuplot 5, this is the number of seconds since the Unix Epoch, and is the number of seconds since Jan 1st, 2000 in versions prior). The first two lines thus get the internal value of midnight on December 26th, 2014 and midnight on December 29th, 2014 (we adjust to the next day so that we can fit all of December 28th in range).

The valid date function converts the date of interest to an internal representation and compares it to these markers. We return 1 (true) if it is in range and 0 (false) if it isn't. Note that the first comparison uses greater than or equal to to test if the date is at least equal to midnight of the start date and the second uses strictly less than to check if the date is before the start of the next day. If you have specific times in mind on those days, slight modifications can be made.

Finally, we run the stats command on a conditional value. If the date in the first column (we need to use the strcol function to load it as a string to feed to the validdate function) is in range, we use the second column value. If the date is not in range, we use the invalid value 1/0. The stats function will not use the invalid values in its analysis.


Additionally, if it is more convenient, we can accept the start and end dates as parameters in the function:

validdate(x,start,end) = (startrange=strptime("%d.%m.%Y",start),endrange=strptime("%d.%m.%Y",end),curdate=strptime("%d.%m.%Y-%H:%M",x),curdate>=startrange&&curdate<endrange)

and call the stats function like

stats 'out.csv' u (validdate(strcol(1),"26.12.2014","29.12.2014")?$2:1/0) prefix "A"

Method 2

Gnuplot has a timecolumn function which can read a column as a time and date. This gives us an alternative method which is simpler, but not necessarily as powerful.

We can do

set timefmt "%d.%m.%Y-%H:%M"
stats [startrange:endrange] 'out.csv' u (timecolumn(1)):2

This will read the first column as a time using the timefmt.

This version works similarly to the above, except the endrange value is accepted instead of rejected (the above version is more powerful if we need more complex tests of our dates and times) and the discarded values are listed as "Out of range" instead of "Invalid".

We can also specify the start and end range inline using

stats [strptime("%d.%m.%Y","26.12.2014"):strptime("%d.%m.%Y","29.12.2014")] 'out.csv' u (timecolumn(1)):2

Note that you MUST NOT be in time mode to use the stats function, otherwise it will just complain. Thus, the above must be ran before calling set xdata time, or after restoring normal mode with set xdata.

In version 5, the timecolumn function can also take an additional argument which specifies the format to use (like timecolumn(1,"%d.%m.%Y-%H:%M") instead of using the timefmt command, which is not necessary in this case)

Note that in version 5, only the two argument form is documented and the one argument form is mentioned in the documentation only as the previous format, but not as an acceptable alternative. The one argument form continues to work for now, but, as it is listed only as a previous format and not an acceptable alternative format, it is possible that the one argument form may stop working in some later version. However, I would expect this to be unlikely, as gnuplot tends to preserve backwards compatability, and the one argument form is useful in cases like the above (so the time format specification can only occur in one place in the script).

Upvotes: 4

Related Questions