dividing a data file to new files based on data on a particular column

Question

I have a data file (data.txt) as shown below:

0  25  10  25000
1  25  7   18000
1  25  9   15000

0  20  9   1000
1  20  8   800
0  20  8   900

0  50  10  4000
0  50  5   2500
1  50  10  5000

I want to copy the rows with same value in the second column to separate files. I want to get following three files:

data.txt_25

0  25  10  25000
1  25  7   18000
1  25  9   15000

data.txt_20

0  20  9   1000
1  20  8   800
0  20  8   900

data.txt_50

0  50  10  4000
0  50  5   2500
1  50  10  5000

I have just started learning awk. I have tried the following bash script:

  1 #!/bin/bash
  2 
  3 for var in 20 25 50
  4 do
  5         awk -v var="$var" '$2==var { print $0 }' data.txt > data.txt_$var
  6 done

While the bash script does what I want it to do, it is time consuming as I have to put the values of second column data in line 3 manually.

So I would like to do this using awk. How can I achieve this using awk ?

Thanks in advance.

RavinderSingh13 · Accepted Answer

Could you please try following, this considers that your 2nd column numbers are NOT in sorted form.

sort -k2 Input_file | 
awk '
prev!=$2{
  close(output_file)
  output_file="data.txt_"$2
}
{
  print > (output_file)
  prev=$2
}'

In case your Input_file's 2nd column is sorted then no need to use sort you could directly use like:

awk '
prev!=$2{
  close(output_file)
  output_file="data.txt_"$2
}
{
  print > (output_file)
  prev=$2
}' Input_file

Explanation: Adding a detailed explanation for above.

sort -k2 Input_file |            ##Sorting Input_file with respect to 2nd column then passing output to awk
awk '                            ##Starting awk program from here.
prev!=$2{                        ##Checking if prev variable is NOT equal to $2 then do following.
  close(output_file)             ##Closing output_file in back-end to avoid "too many files opened" errors.
  output_file="data.txt_"$2      ##Creating variable output_file to data.txt_ with $2 here.
}
{
  print > (output_file)          ##Printing current line to output_file here.
  prev=$2                        ##Setting variable prev to $2 here.
}'

dividing a data file to new files based on data on a particular column

Answers (2)

Related Questions