Reputation: 2263
I have a list of directories (e.g. 0, 50, 100, 150, 200, etc.), each one containing a file called zb_p.xy
with two columns of data. These are examples of such files:
# file 0/zb_p.xy
1 0.1
2 0.2
3 0.15
4 0.11
# file 50/zb_p.xy
1 0.0
2 0.4
3 0.30
4 0.1
I would like to extract the data on column 2 from all the zb_p.xy
files, and plot the average between them versus its standard deviation by using gnuplot in linux.
This was my attempt so far:
LIST = system("ls -1 */zb_p.xy*")
FILES = words(LIST)
FILE(i) = word(LIST,i)
plot for [i=1:FILES] FILE(i)
This code in MATLAB seems to work, but I need something similar in gnuplot:
D=dir('*');
[s ~]=size(D);
for i=1:s
dirName=D(i,1).name;
cd(dirName) %steps into directory
fileID=load('zb_p.xy');
zb(:,i)=fileID(:,2);
cd .. %steps out of directory
end
zb_mean=mean(zb,2);
zb_std=std(zb,0,2);
errorbar(zb_mean,zb_std/sqrt(s),'sk')
Upvotes: 2
Views: 2358
Reputation: 4325
Maybe if you want to use gnuplot to calculate the average (or some other function) on the input values, you can use a variant of my previous answer that appends the selected column from each file but the last, using:
# output a new table where column COL from each (but the last) input file is
# appended to the last input file
BEGIN {
FILENUM = ARGC - 1
}
{
if (ARGIND <= 1) { # first file or stdin
ADD[NR] = ""
}
if (ARGIND == FILENUM) { # last file
if (FILENUM > 1 && NF >= COL && $COL ~ /^-?[0-9]/)
$0 = $0 ADD[FNR];
print $0;
} else {
if (NF >= COL && $COL ~ /^-?[0-9]/) {
ADD[FNR] = ADD[FNR] FS $COL;
}
}
}
Applying it to the input samples, you will get: input1:
720 0.403176
730 0.399838
# Lab = 73.45771 -0.552744 -2.636218
input2:
720 0.394166
730 0.391083
# Lab = 72.911591 -0.718176 -2.942526
input3:
720 0.364636
730 0.361698
# Lab = 70.623329 -0.713199 -2.19574
output from awk -f multi-column-add.awk -v COL=2 input{1,2,3}
:
720 0.364636 0.403176 0.394166
730 0.361698 0.399838 0.391083
# Lab = 70.623329 -0.713199 -2.19574
So you could plot using (($2+$3+$4)/3)
. If you feel like maximums, minimums, errorbars, just go ahead.
Upvotes: 1
Reputation: 4325
As I had exactly the same problem, but wasn't satisfied with the answers given, here's my own version:
I had two-column measurement files, where the first column is an index key, and the second column is the measurement. My files do also have comment lines.
Necessary precondition: Line n of every input file must correspond to the same measurement (all those values are averaged). Lines with comments are not ignored!
My solution uses awk (gawk-4.2.1) to sum up all the values in the column specified as -v COL=n
, where n
is the 1-based column number. So the memory consumption should be proportional to the number of lines, not to the number of files being used.
The trick is to avoid splitting and joining input fields for the output, by misusing the last input file. OK, enough words, let's see the code:
# output a new table where column COL is the average of all input files
BEGIN {
FILENUM = ARGC - 1
}
{
if (ARGIND == 1) { # first file
SUM[NR] = 0
}
}
NF >= COL && $COL ~ /^-?[0-9]/ {
SUM[FNR] += $COL
}
{
if (ARGIND == FILENUM) { # last file
if (FILENUM > 1 && NF >= COL && $COL ~ /^-?[0-9]/)
$COL = SUM[FNR] / FILENUM;
print $0;
}
}
With input files input1
, input2
, and input3
, I use the command
awk -f multi-column-mean.awk -v COL=2 input{1,2,3} >output
to create output
. As a very simple test run, consider these example data:
input1:
720 0.403176
730 0.399838
# Lab = 73.45771 -0.552744 -2.636218
input2:
720 0.394166
730 0.391083
# Lab = 72.911591 -0.718176 -2.942526
input3:
720 0.364636
730 0.361698
# Lab = 70.623329 -0.713199 -2.19574
output:
720 0.387326
730 0.384206
# Lab = 70.623329 -0.713199 -2.19574
Note that the comment is unchanged from the last input file (input3
).
Finally an example plot with my full data (B1
, B2
, and B3
are the original input files, and Mean
is the output file. The last two values are those shown in the example):
The case where there is only one input file is slightly optimized to output the file "as is". To avoid warning awk: multi-column-mean.awk:11: (FILENAME=- FNR=1) Warnung: reference to uninitialized element 'SUM["1"]'
for the "zero input files" case (using standard input) replace the corresponding line with if (ARGIND <= 1) { # first file or stdin
.
Mostly muscles, little fat, hope you like it ;-)
Upvotes: 1
Reputation: 2442
You can paste all the files in one using the following bash command:
# bash: paste filenames in directories 1, 2, and 3
paste */file.dat
# 1/file.dat # 2/file.dat # 3/file.dat
7 6 7 3 2 0
0 4 3 4 0 3
0 8 5 0 9 1
2 9 5 0 2 6
6 8 7 2 4 3
This output can be passed to gnuplot as a temporary file (with 6 columns), so that you can manipulate the columns to be plotted:
# gnuplot
data = "<( paste */file.dat )"
plot data u 1:(($2+$4+$6)/3.0) w lp pt 6 ps 2
EDIT: With the above and for several files, the amount of columns could be huge. The column manipulation can be automated through awk
. The following awk-script calculates the mean and standard deviation for each row, for columns 2, 4, 6, ..., etc (suppose it is called mean.awk
):
#!/usr/bin/awk -f
# script mean.awk
{
mean=0
std=0
# calculate mean
for(i=2; i<=NF; i+=2) mean += $i
mean /= 0.5*NF
# calculate standard dev
for(i=2; i<=NF; i+=2) std += ($i-mean)*($i-mean)
std = sqrt(std/(0.5*NF-1))
print mean, std
}
The bash-command to process your data is then
paste */file.dat | grep -v ^# | awk -f mean.awk
3 3
3.66667 0.57735
3 4.3589
5 4.58258
4.33333 3.21455
where the first and second columns are the mean value and the standard deviation, respectively. The grep command is to ignore the lines beginning with the character #
.
Finally, you can plot the std-dev versus the mean in gnuplot as:
data = "<( paste */file.dat | grep -v ^# | awk -f mean.awk )"
plot data u 1:2 w lp pt 6 ps 2
Example (not the best plot ever):
If you don't want to write an awk-script, this is the one-line command version:
data = "<( paste */file.dat | grep -v ^# | awk '{mean=0; std=0; for(i=2; i<=NF; i+=2) mean += $i; mean /= 0.5*NF; for(i=2; i<=NF; i+=2) std += ($i-mean)*($i-mean); std = sqrt(std/(0.5*NF-1)); print mean, std }' )"
plot data u 1:2 w lp pt 6 ps 2
Upvotes: 3