cosmictypist
cosmictypist

Reputation: 575

Join lines with similar first column

File:

A    20
A    35
B    13
C    14
C    49
C    58

Expected output:

A    20,35
B    13
C    14,49,58

I have a tab separated file as above. I want to combine lines with a comma that have the same first column. I know how to combine the lines to give me a tab separated second column, but I'd like the combined values in column 2 to be separated with a comma instead. This is the command I was using:

LC_ALL='C' awk -F'\t' -v OFS='\t' '{x=$1;$1="";a[x]=a[x]$0}END{for(x in a)print x,a[x]}' input.txt > output.txt

I tried to change -F'\t' to -F',', but that didn't seem to work.

Upvotes: 1

Views: 301

Answers (5)

Akshay Hegde
Akshay Hegde

Reputation: 16997

$ cat infile
A    20
A    35
B    13
C    14
C    49
C    58

$ awk '{a[$1]=($1 in a ? a[$1] ",":"") $2}END{for(i in a)print i,a[i]}' infile
A 20,35
B 13
C 14,49,58

Explanation:

  • a[$1] a is array, $1 is first field, used as array key/index
  • $1 in a if array (a) has index, which is $1 then we get boolean true state
  • a[$1] "," if previous step state is true, then content of array a for such index will be concatenated with 2nd field, otherwise :"" just second field, that when array has no such index.
  • for(i in a)print i, a[i] Loop through array a, and variable i as key, print array key, and value.

a[$1]=($1 in a ? a[$1] ",":"") $2 Can be written as follows for better reading/understanding for beginners.

# if array a seen index $1 before then
if($1 in a){

   # append with existing data
   a[$1] = a[$1] "," $2

# else
}else{

   # did not see before, lets just set new data
   a[$1] = $2

}

Upvotes: 1

Kaushik Nayak
Kaushik Nayak

Reputation: 31676

awk '{ A[$1] = A[$1] d[$1] $2; d[$1] = ","} 
END {for (i in A) print i, A[i]}' input.txt > output.txt

Explanation : A[$1] = A[$1] d[$1] $2; - will set an associated array with index $1 and value A[$1] d[$1] $2. Initially it will be equal to $2 because A[$1] and d[$1] are not defined. d[$1] stores output delimiter "," .

END block prints the array index(unique 1st column ) and elements("," separated string) in a loop.

Upvotes: 1

karakfa
karakfa

Reputation: 67507

here is another one, takes a grouped input file

$ awk -v OFS=, 'function pr() {if(p2) print p2; p1=$1; p2=$0}
                              {if($1==p1) p2=p2 OFS $2; else pr()} 
                END           {pr()}' file

A       20,35
B       13
C       14,49,58

Upvotes: 1

αғsнιη
αғsнιη

Reputation: 2761

Simply do.

awk '{(a[$1])?a[$1]=a[$1]","$2:a[$1]=$2} END{for (i in a) print i"\t"a[i]}' infile

Upvotes: 0

chakradhar kasturi
chakradhar kasturi

Reputation: 689

The problem is its multiple space and not tab

awk -F'[[:space:]][[:space:]]+' -v OFS=' ' '{if(a[$1])a[$1]=a[$1]","$2; else a[$1]=$2;}END{for (i in a)print i, a[i];}' input.txt > output.txt

Upvotes: 0

Related Questions