user55691
user55691

Reputation: 113

Merge lines with the same value in the first column

I have n number of files that looks like:

PACKAGE_LIST_DEV=rpm1 rpm2 rpm3
PACKAGE_LIST_PROD=rpm1 rpm2 rpm3

For example:

File1:

PACKAGE_LIST_DEV=rpm1 rpm2 rpm3
PACKAGE_LIST_PROD=rpm1 rpm2 rpm3

File2:

PACKAGE_LIST_DEV=rpm4 rpm5
PACKAGE_LIST_PROD=rpm4 rpm5

File3:

PACKAGE_LIST_DEV=rpm6 rpm7
PACKAGE_LIST_PROD=rpm6 rpm7

and so on..

And I'd like to get the following:

PACKAGE_LIST_DEV=rpm1 rpm2 rpm3 rpm4 rpm5 rpm6 rpm7
PACKAGE_LIST_PROD=rpm1 rpm2 rpm3 rpm4 rpm5 rpm6 rpm7

So if PACKAGE_LIST in the first column are the same in all files, it should produce one line for each with all other parts of lines joined.

Here's what I've tried:

# Concatenate all files together
cat File1 File2 File3 ... Filen > new_file

PACKAGE_LIST_DEV=rpm1 rpm2 rpm3
PACKAGE_LIST_PROD=rpm1 rpm2 rpm3
PACKAGE_LIST_DEV=rpm4 rpm5
PACKAGE_LIST_PROD=rpm4 rpm5
PACKAGE_LIST_DEV=rpm6 rpm7
PACKAGE_LIST_PROD=rpm6 rpm7

# Join PACKAGE_LIST lines together
awk -F'=' -v OFS='' '{x=$1;$1="=";a[x]=a[x]$0}END{for(x in a)print x,a[x]}' new_file

PACKAGE_LIST_DEV=rpm1 rpm2 rpm3=rpm4 rpm5=rpm6 rpm7
PACKAGE_LIST_PROD=rpm1 rpm2 rpm3=rpm4 rpm5=rpm6 rpm7

As you can see there is an extra = there

Upvotes: 2

Views: 406

Answers (2)

Simon
Simon

Reputation: 10841

Another alternative if the key fields in the files are in sorted order is to use join and sed. To join together as many files as you'd like:

$ join -t= file1 file2 | join -t= - file3 | sed 's/=/ /g;s/ /=/'
PACKAGE_LIST_DEV=rpm1 rpm2 rpm3 rpm4 rpm5 rpm6 rpm7
PACKAGE_LIST_PROD=rpm1 rpm2 rpm3 rpm4 rpm5 rpm6 rpm7

... where the | join -t= - file3 part can be included any number of times with different file names, e.g. ...| join -t= - file4 | join -t= - file5... etc.

The awk solution works well and applies when the key fields are not in sorted order but it holds the file contents in memory and so could run into difficulties with enormous files. As long as the key fields in the files are in sorted order, the join/sed solution works for files of any length.

Upvotes: 1

Ed Morton
Ed Morton

Reputation: 203577

$ awk 'BEGIN{FS=OFS="="} {a[$1]=($1 in a ? a[$1] " " : "") $2} END{for (i in a) print i, a[i]}' file[1-3]
PACKAGE_LIST_PROD=rpm1 rpm2 rpm3 rpm4 rpm5 rpm6 rpm7
PACKAGE_LIST_DEV=rpm1 rpm2 rpm3 rpm4 rpm5 rpm6 rpm7

Upvotes: 3

Related Questions