Reputation: 162
I want to split the following format to unique lines
Input:
17:79412041:C:T,CGGATGTCAT
17:79412059:C:G,T
17:79412138:G:A,C
17:79412192:C:G,T,A
Desired output
17:79412041:C:T
17:79412041:C:CGGATGTCAT
17:79412059:C:G
17:79412059:C:T
17:79412138:G:A
17:79412138:G:C
17:79412192:C:G
17:79412192:C:T
17:79412192:C:A
Basically split the input to unique rows or firstID:secondID:thirdID:FourthID. Here multiple row may have firstID:secondID:thirdID may be common and the FourthID is the one it make each raw unique(that was seperated by "," in the input).
Thanks in advance Shams
Upvotes: 3
Views: 1762
Reputation: 58371
This might work for you (GNU sed):
sed 's/^\(\(.*:\)[^:,]*\),/\1\n\2/;P;D' file
Insert a newline and the key for each comma in a line.
An alternative using a loop and syntactic sugar:
sed -r ':a;s/^((.*:)[^:,]*),/\1\n\2/;ta' file
Upvotes: 0
Reputation: 3089
awk one-liner
$ awk -F":" '{gsub(/,/,":"); a=$1FS$2FS$3; for(i=4; i<=NF; i++) print a FS $i;}' f1
17:79412041:C:T
17:79412041:C:CGGATGTCAT
17:79412059:C:G
17:79412059:C:T
17:79412138:G:A
17:79412138:G:C
17:79412192:C:G
17:79412192:C:T
17:79412192:C:A
We are first replacing all ,
with :
to keep a common delimiter i.e. :
We are then traversing from 4th field to end and printing each field by prefixing first three fields.
Upvotes: 1
Reputation: 133458
Following awk
+ gsub
of it may help you on same too:
awk -F":" '{gsub(",",ORS $1 OFS $2 OFS $3 "&");gsub(/,/,":")} 1' OFS=":" Input_file
Upvotes: 0
Reputation: 67467
another awk
, should work for any number of fields
$ awk -F: '{split($NF,a,","); for(i in a) {sub($NF"$",a[i]); print}}' file
Upvotes: 0
Reputation: 44918
This one-liner here:
$ awk -F':' '{ split($4,a,","); for (i in a) { print $1":"$2":"$3":"a[i] } }' data.txt
Produces:
17:79412041:C:T
17:79412041:C:CGGATGTCAT
17:79412059:C:G
17:79412059:C:T
17:79412138:G:A
17:79412138:G:C
17:79412192:C:G
17:79412192:C:T
17:79412192:C:A
Explanation:
split(string, array, delimiter)
splits the string by the delimiter, and saves the pieces into the array.
The for-in loop simply prints every piece in the array with the first three entries.
The -F':'
part defines the top-level delimiter.
Upvotes: 1