physu
physu

Reputation: 169

dividing a data file to new files based on data on a particular column

I have a data file (data.txt) as shown below:

0  25  10  25000
1  25  7   18000
1  25  9   15000

0  20  9   1000
1  20  8   800
0  20  8   900

0  50  10  4000
0  50  5   2500
1  50  10  5000

I want to copy the rows with same value in the second column to separate files. I want to get following three files:

data.txt_25

0  25  10  25000
1  25  7   18000
1  25  9   15000

data.txt_20

0  20  9   1000
1  20  8   800
0  20  8   900

data.txt_50

0  50  10  4000
0  50  5   2500
1  50  10  5000

I have just started learning awk. I have tried the following bash script:

  1 #!/bin/bash
  2 
  3 for var in 20 25 50
  4 do
  5         awk -v var="$var" '$2==var { print $0 }' data.txt > data.txt_$var
  6 done

While the bash script does what I want it to do, it is time consuming as I have to put the values of second column data in line 3 manually.

So I would like to do this using awk. How can I achieve this using awk ?

Thanks in advance.

Upvotes: 3

Views: 50

Answers (2)

RavinderSingh13
RavinderSingh13

Reputation: 133428

Could you please try following, this considers that your 2nd column numbers are NOT in sorted form.

sort -k2 Input_file | 
awk '
prev!=$2{
  close(output_file)
  output_file="data.txt_"$2
}
{
  print > (output_file)
  prev=$2
}'

In case your Input_file's 2nd column is sorted then no need to use sort you could directly use like:

awk '
prev!=$2{
  close(output_file)
  output_file="data.txt_"$2
}
{
  print > (output_file)
  prev=$2
}' Input_file

Explanation: Adding a detailed explanation for above.

sort -k2 Input_file |            ##Sorting Input_file with respect to 2nd column then passing output to awk
awk '                            ##Starting awk program from here.
prev!=$2{                        ##Checking if prev variable is NOT equal to $2 then do following.
  close(output_file)             ##Closing output_file in back-end to avoid "too many files opened" errors.
  output_file="data.txt_"$2      ##Creating variable output_file to data.txt_ with $2 here.
}
{
  print > (output_file)          ##Printing current line to output_file here.
  prev=$2                        ##Setting variable prev to $2 here.
}'

Upvotes: 3

Sundeep
Sundeep

Reputation: 23667

For the given sample, you can also use this:

awk -v RS= '{f = "data.txt_" $2; print > f; close(f)}' data.txt
  • -v RS= paragraph mode, empty lines are used to separate input records
  • f = "data.txt_" $2 construct filename using second column value (by default awk split input record on spaces/tabs/newlines)
  • print > f write input record contents to filename
  • close(f) close the file

Upvotes: 2

Related Questions