Jon
Jon

Reputation: 601

Split file or extract lines that differ based on a pattern

I have a tab separated file like this:

Supercontig_1.1 400  1500  1       4
Supercontig_1.1 400  1500  2       4
Supercontig_1.1 20000  138566  1       1
Supercontig_1.1 20000  138566  2       1
Supercontig_1.2 300  1000  1       2
Supercontig_1.2 300  1000  2       2
Supercontig_1.2 1300  15000  1       2
Supercontig_1.2 1300  15000  2       2
Supercontig_1.3 0  10000  1       5
Supercontig_1.3 0  10000  2       5

And I want to extract all lines based on the pattern "Supercontig_1.X" into a separate file. I.e. all lines with Supercontig_1.1 in one file, all lines with Supercontig_1.2 in another... I tried looking into the "sed" command, but I am not sure how to use it when the search pattern is not the same for all lines.

Upvotes: 2

Views: 427

Answers (2)

potong
potong

Reputation: 58473

This might work for you (GNU sed):

sed -r ':a;$!N;s/^((\S*)\s.*)\n\2.*/\1/;ta;s/(\S*).*/\/^\1\/w\1/;P;D' file | 
sed -nf - file

This will only work if the file is sorted.

If the file is not sorted use:

sort -u -k1,1 file | sed -r 's#^(\S*).*#/^\1/w\1#' | sed -nf - file

Upvotes: 2

Birei
Birei

Reputation: 36272

One way using awk:

awk '{ print $0 >$1 }' infile

That yields:

==> Supercontig_1.1 <==
Supercontig_1.1 400  1500  1       4
Supercontig_1.1 400  1500  2       4
Supercontig_1.1 20000  138566  1       1
Supercontig_1.1 20000  138566  2       1

==> Supercontig_1.2 <==
Supercontig_1.2 300  1000  1       2
Supercontig_1.2 300  1000  2       2
Supercontig_1.2 1300  15000  1       2
Supercontig_1.2 1300  15000  2       2

==> Supercontig_1.3 <==
Supercontig_1.3 0  10000  1       5
Supercontig_1.3 0  10000  2       5

I don't see your fields separated with comma, only spaces. Change the field separator (FS) if is that case, like: BEGIN { FS=","; } at the beginning of the script.

Upvotes: 3

Related Questions