Vonton
Vonton

Reputation: 3352

Awk - Separate one .txt file to files by condition

I have one problem, I would like to separate one file by condition to more files. INPUT: One text file

variable chrom=chr1
1000 10
1010 20
1020 10
vriable chrom=chr2
1000 20
1100 30
1200 10

OUTPUT: two files for this example.

chr1.txt

variable chrom=chr1
1000 10
1010 20
1020 10

chr2.txt

variable chrom=chr2
1000 20
1100 30
1200 10

So, the separator condition if row starts with chrom=chr$i (i={1..22}) => separate to other text file. Thank you

Upvotes: 0

Views: 104

Answers (2)

Wintermute
Wintermute

Reputation: 44043

Something along these lines:

awk 'BEGIN { filename="unknown.txt" } /^variable chrom=/ { close(filename); filename = substr($0, index($0, "=") + 1) ".txt"; } { print > filename }'

Where the awk code is

BEGIN { filename="unknown.txt" }   # default file name, used only if the
                                   # file doesn't start with a variable chrom=
                                   # line
/^variable chrom=/ {               # in such a line:
  close(filename)                  # close the previous file (if open)
                                   # and set the new filename
  filename = substr($0, index($0, "=") + 1) ".txt"  filename
}
{ print > filename }               # print everything to the current file.

The basic algorithm is very straightforward: Read file linewise, change filename when you find a line that starts a new section, always print the current line to the current file, so the devil is in the detail of isolating the file name from the marker line. The

filename = substr($0, index($0, "=") + 1) ".txt"

approach is simplistic but serviceable for the example you showed: It takes everything after the = and attaches .txt to get the file name. If your marker lines are more complicated than variable chrom=filenamestub, this will have to be amended, but in that case I could only guess your requirements and would probably guess wrong.

Upvotes: 2

ZN13
ZN13

Reputation: 1108

If you know how many lines there are between, you could use

split -l 4 textfile.txt

This will split the textfile every 4th line it finds, making the files xaa and xab, and so on.

Upvotes: 1

Related Questions