madoro
madoro

Reputation: 55

average range of data and plot in gnuplot

I have this kind of data:

label-> 1   2   3   4   5
val1    1.67E+07    2.20E+07    3.04E+07    7.89E+07    1.24E+08
val2    1.71E+07    2.35E+07    2.70E+07    7.80E+07    1.31E+08
val3    1.48E+07    2.15E+07    2.74E+07    7.18E+07    1.17E+08
val4    1.57E+07    2.07E+07    2.49E+07    7.46E+07    1.27E+08
val5    1.32E+07    2.23E+07    3.07E+07    7.50E+07    1.16E+08

I need to plot the label vs the average of each val column, like this:

label-> 1   2   3   4   5
val1    1.67E+07    2.20E+07    3.04E+07    7.89E+07    1.24E+08
val2    1.71E+07    2.35E+07    2.70E+07    7.80E+07    1.31E+08
val3    1.48E+07    2.15E+07    2.74E+07    7.18E+07    1.17E+08
val4    1.57E+07    2.07E+07    2.49E+07    7.46E+07    1.27E+08
val5    1.32E+07    2.23E+07    3.07E+07    7.50E+07    1.16E+08
mean    1.55E+07    2.20E+07    2.81E+07    7.57E+07    1.23E+08

Is there any possibility of perform this operation in gnuplot or should I keep attached to Excel?

Upvotes: 0

Views: 971

Answers (2)

theozh
theozh

Reputation: 26123

Although a rather old question, however, with an unaccepted answer, here is a simple gnuplot-only version. Calculate the means in a do for loop using stats and store the values in a string. Check help do and help stats.

Data: SO31878011.dat

label-> 1   2   3   4   5
val1    1.67E+07    2.20E+07    3.04E+07    7.89E+07    1.24E+08
val2    1.71E+07    2.35E+07    2.70E+07    7.80E+07    1.31E+08
val3    1.48E+07    2.15E+07    2.74E+07    7.18E+07    1.17E+08
val4    1.57E+07    2.07E+07    2.49E+07    7.46E+07    1.27E+08
val5    1.32E+07    2.23E+07    3.07E+07    7.50E+07    1.16E+08

Script: (works for gnuplot>=4.6.5, Feb. 2014)

### plot data of columns and their average
reset

FILE = "SO31878011.dat"

N     = 5
means = ''
do for [col=2:N+1] {
    stats FILE every ::1 u col nooutput
    means = sprintf("%s %g",means, STATS_mean)
}

set key out Left reverse noautotitle

plot for [col=1:N] FILE u 0:col+1:xtic(1) w lp ti columnheader, \
     for [i=1:N] '' every ::1 u 0:(real(word(means,i))) w l \
         lc 0 lt 0 ti sprintf("Mean %d: %g",i,real(word(means,i)))
### end of script

Result:

enter image description here

Upvotes: 0

John_West
John_West

Reputation: 2399

You could do it using awk and gnuplot. Assume your example data (without mean row) is in data.txt. Then you could calculate the mean in each column starting from the second column (from i=2) and the second row (record, or row, #1 -- NR==1 -> do not summate, but fill auxiliary array a with zeroes: a[i]=0.0). For that purpose one could use awk condition: if (NR==1)... else {...calculate the means...}. Awk reads the data row-by-row. In each row, you iterate over fields and summate the data from column with number i into array element a[i]:

{for(i=2;i<=NF;i++) a[i]+=$i;}

When iterating over the first row (NR==1), we would ; At the END of awk script (all rows processed), just divide by number of columns in your data NF-1 to calculate the mean values. Note, the code below assumes you have rectangular-formatted data (NF=const).

Also, save row column labels into label array: if (NR==1) {for(i=2;i<=NF;i++) label[i]=$i; ... }

Then print the labels and mean values into the rows, one row for one label.

for(i=2;i<=NF;i++) {printf label[i]" "; print a[i]/(NF-1)}

The final data table would look that way:

1 15500000
2 22000000
3 28080000
4 75660000
5 123000000

Then you could plot one column against the other. Note, the final data for gnuplot should be formatted in columns, not rows. The following code performs the described operations:

gnuplot> unset key
gnuplot> plot "<export LC_NUMERIC=C; awk '{if (NR==1) {for(i=2;i<=NF;i++) label[i]=$i; a[i]=0.0;} else {for(i=2;i<=NF;i++) a[i]+=$i;};} END {for(i=2;i<=NF;i++) {printf label[i]\" \"; print a[i]/(NF-1)}};' data.txt"

Note, that spaces should be escaped with backslash character \ in the gnuplot.

Upvotes: 0

Related Questions