Reputation: 3352
I have one problem, I would like to separate one file by condition to more files. INPUT: One text file
variable chrom=chr1
1000 10
1010 20
1020 10
vriable chrom=chr2
1000 20
1100 30
1200 10
OUTPUT: two files for this example.
chr1.txt
variable chrom=chr1
1000 10
1010 20
1020 10
chr2.txt
variable chrom=chr2
1000 20
1100 30
1200 10
So, the separator condition if row starts with chrom=chr$i (i={1..22}) => separate to other text file. Thank you
Upvotes: 0
Views: 104
Reputation: 44043
Something along these lines:
awk 'BEGIN { filename="unknown.txt" } /^variable chrom=/ { close(filename); filename = substr($0, index($0, "=") + 1) ".txt"; } { print > filename }'
Where the awk code is
BEGIN { filename="unknown.txt" } # default file name, used only if the
# file doesn't start with a variable chrom=
# line
/^variable chrom=/ { # in such a line:
close(filename) # close the previous file (if open)
# and set the new filename
filename = substr($0, index($0, "=") + 1) ".txt" filename
}
{ print > filename } # print everything to the current file.
The basic algorithm is very straightforward: Read file linewise, change filename when you find a line that starts a new section, always print the current line to the current file, so the devil is in the detail of isolating the file name from the marker line. The
filename = substr($0, index($0, "=") + 1) ".txt"
approach is simplistic but serviceable for the example you showed: It takes everything after the =
and attaches .txt
to get the file name. If your marker lines are more complicated than variable chrom=filenamestub
, this will have to be amended, but in that case I could only guess your requirements and would probably guess wrong.
Upvotes: 2
Reputation: 1108
If you know how many lines there are between, you could use
split -l 4 textfile.txt
This will split the textfile every 4th line it finds, making the files xaa
and xab
, and so on.
Upvotes: 1