Reputation: 113
I have n number of files that looks like:
PACKAGE_LIST_DEV=rpm1 rpm2 rpm3
PACKAGE_LIST_PROD=rpm1 rpm2 rpm3
For example:
File1:
PACKAGE_LIST_DEV=rpm1 rpm2 rpm3
PACKAGE_LIST_PROD=rpm1 rpm2 rpm3
File2:
PACKAGE_LIST_DEV=rpm4 rpm5
PACKAGE_LIST_PROD=rpm4 rpm5
File3:
PACKAGE_LIST_DEV=rpm6 rpm7
PACKAGE_LIST_PROD=rpm6 rpm7
and so on..
And I'd like to get the following:
PACKAGE_LIST_DEV=rpm1 rpm2 rpm3 rpm4 rpm5 rpm6 rpm7
PACKAGE_LIST_PROD=rpm1 rpm2 rpm3 rpm4 rpm5 rpm6 rpm7
So if PACKAGE_LIST in the first column are the same in all files, it should produce one line for each with all other parts of lines joined.
Here's what I've tried:
# Concatenate all files together
cat File1 File2 File3 ... Filen > new_file
PACKAGE_LIST_DEV=rpm1 rpm2 rpm3
PACKAGE_LIST_PROD=rpm1 rpm2 rpm3
PACKAGE_LIST_DEV=rpm4 rpm5
PACKAGE_LIST_PROD=rpm4 rpm5
PACKAGE_LIST_DEV=rpm6 rpm7
PACKAGE_LIST_PROD=rpm6 rpm7
# Join PACKAGE_LIST lines together
awk -F'=' -v OFS='' '{x=$1;$1="=";a[x]=a[x]$0}END{for(x in a)print x,a[x]}' new_file
PACKAGE_LIST_DEV=rpm1 rpm2 rpm3=rpm4 rpm5=rpm6 rpm7
PACKAGE_LIST_PROD=rpm1 rpm2 rpm3=rpm4 rpm5=rpm6 rpm7
As you can see there is an extra = there
Upvotes: 2
Views: 406
Reputation: 10841
Another alternative if the key fields in the files are in sorted order is to use join
and sed
. To join together as many files as you'd like:
$ join -t= file1 file2 | join -t= - file3 | sed 's/=/ /g;s/ /=/'
PACKAGE_LIST_DEV=rpm1 rpm2 rpm3 rpm4 rpm5 rpm6 rpm7
PACKAGE_LIST_PROD=rpm1 rpm2 rpm3 rpm4 rpm5 rpm6 rpm7
... where the | join -t= - file3
part can be included any number of times with different file names, e.g. ...| join -t= - file4 | join -t= - file5
... etc.
The awk
solution works well and applies when the key fields are not in sorted order but it holds the file contents in memory and so could run into difficulties with enormous files. As long as the key fields in the files are in sorted order, the join
/sed
solution works for files of any length.
Upvotes: 1
Reputation: 203577
$ awk 'BEGIN{FS=OFS="="} {a[$1]=($1 in a ? a[$1] " " : "") $2} END{for (i in a) print i, a[i]}' file[1-3]
PACKAGE_LIST_PROD=rpm1 rpm2 rpm3 rpm4 rpm5 rpm6 rpm7
PACKAGE_LIST_DEV=rpm1 rpm2 rpm3 rpm4 rpm5 rpm6 rpm7
Upvotes: 3