Reputation: 55
I have this kind of data:
label-> 1 2 3 4 5
val1 1.67E+07 2.20E+07 3.04E+07 7.89E+07 1.24E+08
val2 1.71E+07 2.35E+07 2.70E+07 7.80E+07 1.31E+08
val3 1.48E+07 2.15E+07 2.74E+07 7.18E+07 1.17E+08
val4 1.57E+07 2.07E+07 2.49E+07 7.46E+07 1.27E+08
val5 1.32E+07 2.23E+07 3.07E+07 7.50E+07 1.16E+08
I need to plot the label vs the average of each val column, like this:
label-> 1 2 3 4 5
val1 1.67E+07 2.20E+07 3.04E+07 7.89E+07 1.24E+08
val2 1.71E+07 2.35E+07 2.70E+07 7.80E+07 1.31E+08
val3 1.48E+07 2.15E+07 2.74E+07 7.18E+07 1.17E+08
val4 1.57E+07 2.07E+07 2.49E+07 7.46E+07 1.27E+08
val5 1.32E+07 2.23E+07 3.07E+07 7.50E+07 1.16E+08
mean 1.55E+07 2.20E+07 2.81E+07 7.57E+07 1.23E+08
Is there any possibility of perform this operation in gnuplot or should I keep attached to Excel?
Upvotes: 0
Views: 971
Reputation: 26123
Although a rather old question, however, with an unaccepted answer, here is a simple gnuplot-only version. Calculate the means in a do for
loop using stats
and store the values in a string. Check help do
and help stats
.
Data: SO31878011.dat
label-> 1 2 3 4 5
val1 1.67E+07 2.20E+07 3.04E+07 7.89E+07 1.24E+08
val2 1.71E+07 2.35E+07 2.70E+07 7.80E+07 1.31E+08
val3 1.48E+07 2.15E+07 2.74E+07 7.18E+07 1.17E+08
val4 1.57E+07 2.07E+07 2.49E+07 7.46E+07 1.27E+08
val5 1.32E+07 2.23E+07 3.07E+07 7.50E+07 1.16E+08
Script: (works for gnuplot>=4.6.5, Feb. 2014)
### plot data of columns and their average
reset
FILE = "SO31878011.dat"
N = 5
means = ''
do for [col=2:N+1] {
stats FILE every ::1 u col nooutput
means = sprintf("%s %g",means, STATS_mean)
}
set key out Left reverse noautotitle
plot for [col=1:N] FILE u 0:col+1:xtic(1) w lp ti columnheader, \
for [i=1:N] '' every ::1 u 0:(real(word(means,i))) w l \
lc 0 lt 0 ti sprintf("Mean %d: %g",i,real(word(means,i)))
### end of script
Result:
Upvotes: 0
Reputation: 2399
You could do it using awk
and gnuplot
. Assume your example data (without mean
row) is in data.txt
.
Then you could calculate the mean in each column starting from the second column (from i=2
) and the second row (record, or row, #1 -- NR==1
-> do not summate, but fill auxiliary array a
with zeroes: a[i]=0.0
). For that purpose one could use awk
condition: if (NR==1)... else {...calculate the means...}
.
Awk
reads the data row-by-row. In each row, you iterate over fields and summate the data from column with number i
into array element a[i]
:
{for(i=2;i<=NF;i++) a[i]+=$i;}
When iterating over the first row (NR==1
), we would ;
At the END
of awk
script (all rows processed), just divide by number of columns in your data NF-1
to calculate the mean values. Note, the code below assumes you have rectangular-formatted data (NF
=const).
Also, save row column labels into label
array:
if (NR==1) {for(i=2;i<=NF;i++) label[i]=$i; ... }
Then print the labels and mean values into the rows, one row for one label.
for(i=2;i<=NF;i++) {printf label[i]" "; print a[i]/(NF-1)}
The final data table would look that way:
1 15500000
2 22000000
3 28080000
4 75660000
5 123000000
Then you could plot one column against the other. Note, the final data for gnuplot should be formatted in columns, not rows. The following code performs the described operations:
gnuplot> unset key
gnuplot> plot "<export LC_NUMERIC=C; awk '{if (NR==1) {for(i=2;i<=NF;i++) label[i]=$i; a[i]=0.0;} else {for(i=2;i<=NF;i++) a[i]+=$i;};} END {for(i=2;i<=NF;i++) {printf label[i]\" \"; print a[i]/(NF-1)}};' data.txt"
Note, that spaces should be escaped with backslash character \
in the gnuplot
.
Upvotes: 0