shams
shams

Reputation: 162

Split rows to multiple line based on comma : one liner solution

I want to split the following format to unique lines

Input:

17:79412041:C:T,CGGATGTCAT
17:79412059:C:G,T
17:79412138:G:A,C
17:79412192:C:G,T,A

Desired output

17:79412041:C:T
17:79412041:C:CGGATGTCAT
17:79412059:C:G
17:79412059:C:T
17:79412138:G:A
17:79412138:G:C
17:79412192:C:G
17:79412192:C:T
17:79412192:C:A

Basically split the input to unique rows or firstID:secondID:thirdID:FourthID. Here multiple row may have firstID:secondID:thirdID may be common and the FourthID is the one it make each raw unique(that was seperated by "," in the input).

Thanks in advance Shams

Upvotes: 3

Views: 1762

Answers (5)

potong
potong

Reputation: 58371

This might work for you (GNU sed):

sed 's/^\(\(.*:\)[^:,]*\),/\1\n\2/;P;D' file

Insert a newline and the key for each comma in a line.

An alternative using a loop and syntactic sugar:

sed -r ':a;s/^((.*:)[^:,]*),/\1\n\2/;ta' file

Upvotes: 0

Rahul Verma
Rahul Verma

Reputation: 3089

awk one-liner

$ awk -F":" '{gsub(/,/,":"); a=$1FS$2FS$3; for(i=4; i<=NF; i++) print a FS $i;}' f1
17:79412041:C:T
17:79412041:C:CGGATGTCAT
17:79412059:C:G
17:79412059:C:T
17:79412138:G:A
17:79412138:G:C
17:79412192:C:G
17:79412192:C:T
17:79412192:C:A

We are first replacing all , with : to keep a common delimiter i.e. :

We are then traversing from 4th field to end and printing each field by prefixing first three fields.

Upvotes: 1

RavinderSingh13
RavinderSingh13

Reputation: 133458

Following awk + gsub of it may help you on same too:

awk -F":" '{gsub(",",ORS $1 OFS $2 OFS $3 "&");gsub(/,/,":")} 1' OFS=":"   Input_file

Upvotes: 0

karakfa
karakfa

Reputation: 67467

another awk, should work for any number of fields

$ awk -F: '{split($NF,a,","); for(i in a) {sub($NF"$",a[i]); print}}' file

Upvotes: 0

Andrey Tyukin
Andrey Tyukin

Reputation: 44918

This one-liner here:

$ awk -F':' '{ split($4,a,","); for (i in a) { print $1":"$2":"$3":"a[i] } }' data.txt

Produces:

17:79412041:C:T
17:79412041:C:CGGATGTCAT
17:79412059:C:G
17:79412059:C:T
17:79412138:G:A
17:79412138:G:C
17:79412192:C:G
17:79412192:C:T
17:79412192:C:A

Explanation:

split(string, array, delimiter)

splits the string by the delimiter, and saves the pieces into the array.

The for-in loop simply prints every piece in the array with the first three entries.

The -F':' part defines the top-level delimiter.

Upvotes: 1

Related Questions