Reputation: 601
I have a tab separated file like this:
Supercontig_1.1 400 1500 1 4
Supercontig_1.1 400 1500 2 4
Supercontig_1.1 20000 138566 1 1
Supercontig_1.1 20000 138566 2 1
Supercontig_1.2 300 1000 1 2
Supercontig_1.2 300 1000 2 2
Supercontig_1.2 1300 15000 1 2
Supercontig_1.2 1300 15000 2 2
Supercontig_1.3 0 10000 1 5
Supercontig_1.3 0 10000 2 5
And I want to extract all lines based on the pattern "Supercontig_1.X" into a separate file. I.e. all lines with Supercontig_1.1 in one file, all lines with Supercontig_1.2 in another... I tried looking into the "sed" command, but I am not sure how to use it when the search pattern is not the same for all lines.
Upvotes: 2
Views: 427
Reputation: 58473
This might work for you (GNU sed):
sed -r ':a;$!N;s/^((\S*)\s.*)\n\2.*/\1/;ta;s/(\S*).*/\/^\1\/w\1/;P;D' file |
sed -nf - file
This will only work if the file is sorted.
If the file is not sorted use:
sort -u -k1,1 file | sed -r 's#^(\S*).*#/^\1/w\1#' | sed -nf - file
Upvotes: 2
Reputation: 36272
One way using awk
:
awk '{ print $0 >$1 }' infile
That yields:
==> Supercontig_1.1 <==
Supercontig_1.1 400 1500 1 4
Supercontig_1.1 400 1500 2 4
Supercontig_1.1 20000 138566 1 1
Supercontig_1.1 20000 138566 2 1
==> Supercontig_1.2 <==
Supercontig_1.2 300 1000 1 2
Supercontig_1.2 300 1000 2 2
Supercontig_1.2 1300 15000 1 2
Supercontig_1.2 1300 15000 2 2
==> Supercontig_1.3 <==
Supercontig_1.3 0 10000 1 5
Supercontig_1.3 0 10000 2 5
I don't see your fields separated with comma, only spaces. Change the field separator (FS
) if is that case, like: BEGIN { FS=","; }
at the beginning of the script.
Upvotes: 3