Reputation: 169
I have a data file (data.txt) as shown below:
0 25 10 25000
1 25 7 18000
1 25 9 15000
0 20 9 1000
1 20 8 800
0 20 8 900
0 50 10 4000
0 50 5 2500
1 50 10 5000
I want to copy the rows with same value in the second column to separate files. I want to get following three files:
data.txt_25
0 25 10 25000
1 25 7 18000
1 25 9 15000
data.txt_20
0 20 9 1000
1 20 8 800
0 20 8 900
data.txt_50
0 50 10 4000
0 50 5 2500
1 50 10 5000
I have just started learning awk. I have tried the following bash script:
1 #!/bin/bash
2
3 for var in 20 25 50
4 do
5 awk -v var="$var" '$2==var { print $0 }' data.txt > data.txt_$var
6 done
While the bash script does what I want it to do, it is time consuming as I have to put the values of second column data in line 3 manually.
So I would like to do this using awk. How can I achieve this using awk ?
Thanks in advance.
Upvotes: 3
Views: 50
Reputation: 133428
Could you please try following, this considers that your 2nd column numbers are NOT in sorted form.
sort -k2 Input_file |
awk '
prev!=$2{
close(output_file)
output_file="data.txt_"$2
}
{
print > (output_file)
prev=$2
}'
In case your Input_file's 2nd column is sorted then no need to use sort you could directly use like:
awk '
prev!=$2{
close(output_file)
output_file="data.txt_"$2
}
{
print > (output_file)
prev=$2
}' Input_file
Explanation: Adding a detailed explanation for above.
sort -k2 Input_file | ##Sorting Input_file with respect to 2nd column then passing output to awk
awk ' ##Starting awk program from here.
prev!=$2{ ##Checking if prev variable is NOT equal to $2 then do following.
close(output_file) ##Closing output_file in back-end to avoid "too many files opened" errors.
output_file="data.txt_"$2 ##Creating variable output_file to data.txt_ with $2 here.
}
{
print > (output_file) ##Printing current line to output_file here.
prev=$2 ##Setting variable prev to $2 here.
}'
Upvotes: 3
Reputation: 23667
For the given sample, you can also use this:
awk -v RS= '{f = "data.txt_" $2; print > f; close(f)}' data.txt
-v RS=
paragraph mode, empty lines are used to separate input recordsf = "data.txt_" $2
construct filename using second column value (by default awk split input record on spaces/tabs/newlines)print > f
write input record contents to filenameclose(f)
close the fileUpvotes: 2