Vonton
Vonton

Reputation: 3324

awk - separate by 1 column by condition in loop

Dear all I need some help.

I have this input file:

 chr1
 chr1 
 chr2 
 chr2 
 chr3 
 chr3

I would like to separate it into the following output files:

"1st file"

chr1
chr1
chr1

"2nd file"

chr2 
chr2

"3rd file"

chr3
chr3

I am using this code but it is not working

for i in {1..3}                 
do 
    awk '{if ($1 == "chr"$i) {print $0}}' 17_n.tsv > $i 
done

Upvotes: 0

Views: 77

Answers (3)

Tom Fenech
Tom Fenech

Reputation: 74596

Perhaps you could use something like this:

$ cat file
 chr1
 chr1
 chr2
 chr2
 chr3
 chr3
$ awk '{suffix = substr($1, length($1)); print > "file" suffix}' file
$ cat file1
 chr1
 chr1
$ cat file2
 chr2
 chr2
$ cat file3
 chr3
 chr3

Basically, take the last character of the first field and use it to determine the filename.

If there can be more than one digit at the end, you can use this instead:

awk 'match($1, /[0-9]+$/) { print > ("file" substr($1, RSTART)) }' file

match sets RSTART to the position of the start of the match, so it can be used with substr to extract the numerical part of the input.

Upvotes: 1

Kent
Kent

Reputation: 195029

if your lines are in format N non-number chars + N numbers, you can try:

awk '{f=$0;sub(/^[^0-9]*/,"",f);print >("output"f)}' input

this won't work for ch0r1.

If you want it to work for ch0r1 too, use gawk:

awk '{f=gensub(/^.*[^0-9]([0-9]*)$/,"\\1","g");print >("output"f)}' file

Upvotes: 1

John B
John B

Reputation: 3646

As awk is a separate language with it's own interpreter, bash variables can't be used properly in awk without passing them first using the -v option. Also, the default action in awk is to print, so you don't need {print $0}.

So this would work:

for i in {1..3}                 
do 
    awk -v i=$i '$1 == "chr"i' 17_n.tsv > $i 
done

That said, you can also accomplish what you want in a read loop:

while read -r line
do
    [[ $line == chr+([0-9]) ]] && echo $line >> ${line#chr}
done < 17_n.tsv

Upvotes: 1

Related Questions