Reputation: 3099
I have a set of directories:
RUN1 RUN2 RUN3
Within each those directories, I have files. RUN1 has:
mod1_1 mod1_2 mod1_3
and RUN2 has:
mod2_1 mod2_2 mod2_3
etc.
Each file has lines like this (this is mod1_1):
8.69e-01 2.59e-01 7.82e-01 4.92e-01
8.69e-01 2.56e-01 7.84e-01 4.95e-01
8.72e-01 2.54e-01 7.83e-01 5.00e-01
8.71e-01 2.53e-01 7.84e-01 5.01e-01
8.73e-01 2.53e-01 7.81e-01 4.99e-01
And this is mod1_2:
8.69e-01 2.59e-01 7.82e-01 4.98e-01
8.69e-01 2.56e-01 7.84e-01 4.90e-01
8.72e-01 2.54e-01 7.83e-01 5.00e-01
8.71e-01 2.53e-01 7.84e-01 5.01e-01
8.73e-01 2.53e-01 7.81e-01 4.99e-01
I want to create a new file that contains only the smallest number in column 4 for each mod file. For example, suppose mod1_1 and mod2_1 are the only files. I want to create a new file that contains line 1 from mod1_1 and line 2 from mod2_1:
8.69e-01 2.59e-01 7.82e-01 4.92e-01
8.69e-01 2.56e-01 7.84e-01 4.90e-01
I would like to do this for each RUN directory. I have tried this:
#/bin/bash
finddir=$(find -type d -name 'RUN*' | sort) #find the dirs
for i in $finddir; do
cd $i
echo $(pwd)
findfiles=$(find -type f -name 'mod*' | sort -V) #find the files
echo $findfiles
for j in $findfiles; do
s1=$(sort -k3,3 j)
echo $s1
done
My problem is the sort command, and I don't know how to write the results to a file. Any ideas?
Pseudocode in case it's helpful:
For each directory RUN*
For each file mod*
get the minimum value in column 4, save the line that has that value
End for
Write the lines that had the minimum values to a new file
End for
EDIT: Still having issues. Here's how I've modified:
#/bin/bash
finddir=$(find -type d -name 'RUN*' | sort) #find the dirs
for i in $finddir; do
cd $i
echo $(pwd)
findfiles=$(find -type f -name 'mod*' | sort -V) #find the files
for j in $findfiles; do
s1=$(sort -k 4 -g $j)
echo -n "$s1"
done
cd ..
done
I was 'cd'ing in the wrong part. This is a bit better - it gives me the four numbers on each line - but it's not returning only the line with the smallest value of column 4 from each file. Also, I still don't know how to export the final results to a new file.
Upvotes: 2
Views: 139
Reputation: 11469
for each of these files 1_1
or 1_2
, following command should give you the row that has lowest number in the 4th column in that file:
~]$ cat 1_2
8.69e-01 2.59e-01 7.82e-01 4.98e-01
8.69e-01 2.56e-01 7.84e-01 4.90e-01
8.72e-01 2.54e-01 7.83e-01 5.00e-01
8.71e-01 2.53e-01 7.84e-01 5.01e-01
8.73e-01 2.53e-01 7.81e-01 4.99e-01
Now use sort -k
~]$ sort -k 4 test | head -1
8.69e-01 2.56e-01 7.84e-01 4.90e-01
Without head -1
you should see they are sorted according to the 4th column:
]$ sort -k 4 1_2
8.69e-01 2.56e-01 7.84e-01 4.90e-01
8.69e-01 2.59e-01 7.82e-01 4.98e-01
8.73e-01 2.53e-01 7.81e-01 4.99e-01
8.72e-01 2.54e-01 7.83e-01 5.00e-01
8.71e-01 2.53e-01 7.84e-01 5.01e-01
#!/bin/bash
resultfile="somefile.txt"
for d in $(find . -type d -name 'RUN*');
do
find $d -type f -name 'mod*' -exec sort -k4 -g {} \; | head -1 >> "$resultfile"
done
Upvotes: 1
Reputation: 383
There is a couple of problems: 1.) use $j instead of j in the sort command 2.) quote your variables on echo (see How do I preserve line breaks when storing a command output to a variable in bash? for details) 3.) you cd into a directory but never go back... you probably want to go back ...
I tested a simpler version of your code and (not going into directories) and that works:
#!/bin/bash
findfiles=$(find -type f -name 'mod*' | sort -V) #find the files
for j in $findfiles; do
echo $j
s1=$(sort -k 4 -g $j)
echo "$s1"
done
Note, that I used sort -g so floating point values are handled properly, e.g. if you change your data to (using 4.95e-02 instead of 4.95e-01 in the second row):
8.69e-01 2.59e-01 7.82e-01 4.92e-01
8.69e-01 2.56e-01 7.84e-01 4.95e-02
8.73e-01 2.53e-01 7.81e-01 4.99e-01
8.72e-01 2.54e-01 7.83e-01 5.00e-01
8.71e-01 2.53e-01 7.84e-01 5.01e-01
then without -g the order will be wrong:
$ cat test.dat | sort -k 4
8.69e-01 2.59e-01 7.82e-01 4.92e-01
8.69e-01 2.56e-01 7.84e-01 4.95e-02
8.73e-01 2.53e-01 7.81e-01 4.99e-01
8.72e-01 2.54e-01 7.83e-01 5.00e-01
8.71e-01 2.53e-01 7.84e-01 5.01e-01
using -g instead, order will handle the exponent correct:
$ cat test.dat | sort -k 4 -g
8.69e-01 2.56e-01 7.84e-01 4.95e-02
8.69e-01 2.59e-01 7.82e-01 4.92e-01
8.73e-01 2.53e-01 7.81e-01 4.99e-01
8.72e-01 2.54e-01 7.83e-01 5.00e-01
8.71e-01 2.53e-01 7.84e-01 5.01e-01
Upvotes: 1