Reputation: 1344
I have 2 files which I need to combine and generate a 3rd file. Please find the sample below,
File 1
xab=p11
aab=p12
aac=p23
xac=p15
yab=p16
File 2
aab=p17
xac=p25
yyc=p22
I would like to preserve the order of the first file and append the second file. The result should be:
File 3
xab=p11
aab=p17
aac=p23
xac=p25
yab=p16
yyc=p22
I tried many ways, but not able to get a simpler, easy to understandable solution. The one I found in StackOverflow was working, but it is hard to understand and explain to a third person. The solution I found was
cat en_us.txt en_US2.txt | tr -s '\n' | awk -F= '!a[$1]{b[++i]=$1} {a[$1]=$0;} END{for(j=1;j<=i;j++){print a[b[j]]}}'
Can anyone try this and get a readable solution (probably one not using awk
)
Upvotes: 1
Views: 1164
Reputation: 329
Given that the original request was for something more readable than awk, here are a few Tcl solutions.
#!/usr/bin/env tclsh
package require fileutil
foreach file $argv {
fileutil::foreachLine line $file {
lassign [split $line =] key value
dict set data $key $value
}
}
dict for {key value} $data {
puts $key=$value
}
The only two lines that might not seem obvious are:
lassign
which takes the list in its first argument and creates variables with the names of its remaining arguments (variable destructuring).dict set
which adds a new entry to a dictionary (hash map / associative array) named data
with the given key/value pair. Tcl will automatically create nonexistent variables the first time they are assigned to.Tcl dictionaries preserve the insertion order (similar to Ruby as mentioned in other answers).
#!/usr/bin/env tclsh
package require fileutil
foreach file $argv {
fileutil::foreachLine line $file {
dict set data {*}[split $line =]
}
}
dict for {k v} $data {puts $k=$v}
The confusing line here uses the splat operator {*}
which expands (explodes) a list as individual arguments, thus saving us the need to create temporary variables to hold the key/value pairs.
cat f1 f2 | owh '' 'dict set data {*}[split $0 =]' 'dict for {k v} $data {puts $k=$v}'
$0
to represent the whole input line.tawk -F = 'line {dict set data $F(1) $F(2)}; END {dict for {k v} $data {puts $k=$v}}' f1 f2
$1
as $F(1)
.and a whole-line preserving version (if formatting is more complicated to reproduce than the simple x=y here):
tawk -F = 'line {dict set data $F(1) $F(0)}; END {puts [join [dict values $data] \n]}' f1 f2
Where:
$F(0)
is the whole input line.dict values
command returns a list of items like xab=p11
, aab=p17
, ...\n
(newline) in the join
command.Upvotes: 1
Reputation: 103874
Since ruby
hashes maintain insertion order, you can just maintain a hash of the keys and update that key if a new value is seen:
ruby -F= -ane 'BEGIN{h=Hash.new()}
h[$F[0]]=$F[1].rstrip
END{h.map{|l| puts l.join("=")}}' f1.txt f2.txt
Prints:
xab=p11
aab=p17
aac=p23
xac=p25
yab=p16
yyc=p22
Upvotes: 2
Reputation: 23667
Another awk
solution:
$ awk -F'=' '{ if($1 in b) a[b[$1]]=$0;
else{a[++i]=$0; b[$1]=i} }
END{for(j=1;j<=i;j++) print a[j]}' f1 f2
xab=p11
aab=p17
aac=p23
xac=p25
yab=p16
yyc=p22
NR==FNR
stuffelse{a[++i]=$0; b[$1]=i}
this code is executed if first column isn't seen before
a[++i]=$0
this saves the line content based on numerical keyb[$1]=i
this array helps to get the numerical key number based on first columnif($1 in b) a[b[$1]]=$0
this is executed when first column already exists
a[b[$1]]=$0
this will update the earlier entryEND{for(j=1;j<=i;j++) print a[j]}
print the array content after all input lines have been processedWith ruby
, it is easier as the insertion order is retained by default.
$ ruby -F'=' -lane 'BEGIN{h={}}; h[$F[0]]=$_; END{puts h.values}' f1 f2
xab=p11
aab=p17
aac=p23
xac=p25
yab=p16
yyc=p22
BEGIN{h={}};
assign empty hash
to variable h
h[$F[0]]=$_
save contents of input line based on first fieldputs h.values
print values of each hash
keyYou can save some space by using h[$F[0]]=$F[1]
and then END{h.each_key{|k| puts "#{k}=#{h[k]}"}}
Upvotes: 10
Reputation: 203665
$ cat tst.awk
BEGIN { FS=OFS="=" }
{ key=$1; val=$2 }
NR==FNR {
keys[++numKeys] = key
key2val[key] = val
next
}
{
if ( key in key2val ) {
val = key2val[key]
delete key2val[key]
}
print key, val
}
END {
for (keyNr=1; keyNr<=numKeys; keyNr++) {
key = keys[keyNr]
if (key in key2val) {
print key, key2val[key]
}
}
}
$ awk -f tst.awk file2 file1
xab=p11
aab=p17
aac=p23
xac=p25
yab=p16
yyc=p22
Upvotes: 4
Reputation: 133538
EDIT: In case you want to maintain the order of both the Input_file(s) then try following.
awk '
BEGIN{
FS=OFS="="
}
FNR==NR{
if(!($1 in d)){
e[++count]=$1
}
a[$1]=$2
next
}
{
print $1,($1 in a?a[$1]:$2)
c[$1]
}
END{
for(i=1;i<=count;i++){
if(!(e[i] in c)){ print e[i],a[e[i]] }
}
}
' Input_file2 Input_file1
Could you please try following, written and tested with shown samples(this will not take care of order of Input_file2 lines).
awk '
BEGIN{
FS=OFS="="
}
FNR==NR{
a[$1]=$2
next
}
{
print $1,($1 in a?a[$1]:$2)
c[$1]
}
END{
for(i in a){
if(!(i in c)){
print i,a[i]
}
}
}
' Input_file2 Input_file1
Upvotes: 2
Reputation: 785256
You may use this awk
command:
awk 'BEGIN {
FS=OFS="="
}
FNR == NR {
a[$1] = $2
next
}
$1 in a {
$2 = a[$1]
delete a[$1]
}
1
END {
for (i in a)
print i, a[i]
}' file2 file1
xab=p11
aab=p17
aac=p23
xac=p25
yab=p16
yyc=p22
Upvotes: 1