Reputation: 3324
Dear all I need some help.
I have this input file:
chr1
chr1
chr2
chr2
chr3
chr3
I would like to separate it into the following output files:
"1st file"
chr1
chr1
chr1
"2nd file"
chr2
chr2
"3rd file"
chr3
chr3
I am using this code but it is not working
for i in {1..3}
do
awk '{if ($1 == "chr"$i) {print $0}}' 17_n.tsv > $i
done
Upvotes: 0
Views: 77
Reputation: 74596
Perhaps you could use something like this:
$ cat file
chr1
chr1
chr2
chr2
chr3
chr3
$ awk '{suffix = substr($1, length($1)); print > "file" suffix}' file
$ cat file1
chr1
chr1
$ cat file2
chr2
chr2
$ cat file3
chr3
chr3
Basically, take the last character of the first field and use it to determine the filename.
If there can be more than one digit at the end, you can use this instead:
awk 'match($1, /[0-9]+$/) { print > ("file" substr($1, RSTART)) }' file
match
sets RSTART
to the position of the start of the match, so it can be used with substr
to extract the numerical part of the input.
Upvotes: 1
Reputation: 195029
if your lines are in format N non-number chars + N numbers
, you can try:
awk '{f=$0;sub(/^[^0-9]*/,"",f);print >("output"f)}' input
this won't work for ch0r1
.
If you want it to work for ch0r1
too, use gawk:
awk '{f=gensub(/^.*[^0-9]([0-9]*)$/,"\\1","g");print >("output"f)}' file
Upvotes: 1
Reputation: 3646
As awk
is a separate language with it's own interpreter, bash
variables can't be used properly in awk
without passing them first using the -v
option. Also, the default action in awk
is to print, so you don't need {print $0}
.
So this would work:
for i in {1..3}
do
awk -v i=$i '$1 == "chr"i' 17_n.tsv > $i
done
That said, you can also accomplish what you want in a read
loop:
while read -r line
do
[[ $line == chr+([0-9]) ]] && echo $line >> ${line#chr}
done < 17_n.tsv
Upvotes: 1