Reputation: 575
File:
A 20
A 35
B 13
C 14
C 49
C 58
Expected output:
A 20,35
B 13
C 14,49,58
I have a tab separated file as above. I want to combine lines with a comma that have the same first column. I know how to combine the lines to give me a tab separated second column, but I'd like the combined values in column 2 to be separated with a comma instead. This is the command I was using:
LC_ALL='C' awk -F'\t' -v OFS='\t' '{x=$1;$1="";a[x]=a[x]$0}END{for(x in a)print x,a[x]}' input.txt > output.txt
I tried to change -F'\t'
to -F','
, but that didn't seem to work.
Upvotes: 1
Views: 301
Reputation: 16997
$ cat infile
A 20
A 35
B 13
C 14
C 49
C 58
$ awk '{a[$1]=($1 in a ? a[$1] ",":"") $2}END{for(i in a)print i,a[i]}' infile
A 20,35
B 13
C 14,49,58
Explanation:
a[$1]
a
is array, $1
is first field, used as array key/index$1 in a
if array (a
) has index, which is $1
then we get boolean true statea[$1] ","
if previous step state is true, then content of array a
for such index will be concatenated with 2nd field, otherwise :""
just second field, that when array has no such index.for(i in a)print i, a[i]
Loop through array a
, and variable i
as key,
print array key, and value. a[$1]=($1 in a ? a[$1] ",":"") $2
Can be written as follows for better reading/understanding for beginners.
# if array a seen index $1 before then
if($1 in a){
# append with existing data
a[$1] = a[$1] "," $2
# else
}else{
# did not see before, lets just set new data
a[$1] = $2
}
Upvotes: 1
Reputation: 31676
awk '{ A[$1] = A[$1] d[$1] $2; d[$1] = ","}
END {for (i in A) print i, A[i]}' input.txt > output.txt
Explanation :
A[$1] = A[$1] d[$1] $2;
- will set an associated array with index $1
and value A[$1] d[$1] $2
. Initially it will be equal to $2
because A[$1]
and d[$1]
are not defined. d[$1]
stores output delimiter ","
.
END
block prints the array index(unique 1st column ) and elements(","
separated string) in a loop.
Upvotes: 1
Reputation: 67507
here is another one, takes a grouped input file
$ awk -v OFS=, 'function pr() {if(p2) print p2; p1=$1; p2=$0}
{if($1==p1) p2=p2 OFS $2; else pr()}
END {pr()}' file
A 20,35
B 13
C 14,49,58
Upvotes: 1
Reputation: 2761
Simply do.
awk '{(a[$1])?a[$1]=a[$1]","$2:a[$1]=$2} END{for (i in a) print i"\t"a[i]}' infile
Upvotes: 0
Reputation: 689
The problem is its multiple space and not tab
awk -F'[[:space:]][[:space:]]+' -v OFS=' ' '{if(a[$1])a[$1]=a[$1]","$2; else a[$1]=$2;}END{for (i in a)print i, a[i];}' input.txt > output.txt
Upvotes: 0