aan
aan

Reputation: 89

Separate comma delimited cells (varying entries) to new rows

I have .txt file (tab separated) with two columns. First column has comma separated multiple values. Input format

a,b,c x
a     y
c,d,e z

Desired output is

a x
b x
c x
a y
c z
d z
e z

The number of comma separated entries vary between 1-40. I tried following awk command, but it only works if there are two comma separated entries in first column:

awk 'BEGIN{FS="[       ,]"} {print $1, $NF; print $2, $NF}' input

How can I modify the above command (or anything new) that can do the job. Thanks.

Upvotes: 2

Views: 119

Answers (6)

karakfa
karakfa

Reputation: 67507

another awk

$ awk -F',| +' '{for(i=1;i<NF;i++) print $i,$NF}' 

set the field delimiter to space(s) or comma, print pair of all fields (up to the last field) with last field.

Upvotes: 3

ctac_
ctac_

Reputation: 2471

You can use this sed too

sed -E ':A;s/([^,]*),(.*)(\t.*$)/\1\3\n\2\3/;tA' infile

You get an output tab separated.

If you don't want this tab :

sed -E ':A;s/([^,]*),(.*)(\t.*$)/\1\3\n\2\3/;tA;s/\t/ /g' infile

Upvotes: 0

potong
potong

Reputation: 58420

This might work for you (GNU sed):

sed -r 's/\s+(\S+)$/ \1/;s/,(.*( \S+))$/\2\n\1/;P;D' file

Remove extra spaces from the current line. Replace each , by a space followed by the last field of the current line followed by a newline. Print the first line in the current line, delete the first line and repeat.

Upvotes: 0

Panos Papadopoulos
Panos Papadopoulos

Reputation: 117

BEGIN {
    FS = "\t"
}

NF == 2 {
    n = split($1, a, ",")
    for (i = 1; i <= n; i++) {
        print a[i], $2
    }
}

Upvotes: 1

choroba
choroba

Reputation: 241868

Perl to the rescue!

perl -lane 'print "$F[0] $_ $F[2]" for glob "{$F[1]}"' file
  • -n reads the input line by line
  • -l removes newlines from the input and adds them to prints
  • -a splits each line on whitespace into the @F array
  • glob expands comma separated lists in curly brackets similarly to the shell

Update: You changed the specification (without marking any update). The solution remains similar, just use

perl -lane 'print "$_ $F[1]" for glob "{$F[0]}"' file

Upvotes: 2

alexanderlz
alexanderlz

Reputation: 589

awk's split function to the rescue:

awk '{split($1,arr,","); for (key in arr) { print arr[key],$2 }}' input

Upvotes: 1

Related Questions