Reputation: 307
I've two files file1 & file2 and i want to delete unmatched rows based on first column i.e. ID
File1
ID,NAME,ADDRESS
11,PP,LONDON
12,SS,BERLIN
13,QQ,FRANCE
14,LL,JAPAN
File2
ID,NAME,ADDRESS
11,PP,LONDON
12,SS,BERLIN
13,QQ,FRANCE
16,WW,DUBAI
I want to delete lines
From File1
14,LL,JAPAN
From File2
16,WW,DUBAI
Using DIFF command i am able to file but i want to delete these lines.
Also Using AWK i am able to redirect these lines to another file using below command.
awk 'FNR==NR{a[$1];next};!($1 in a)' File1 File2 > File3
But don't know how to delete.
Can we do this ?
Using sed -id 's/AWK OUTPUT/g' File1
Thanks
Upvotes: 1
Views: 640
Reputation: 37394
One using GNU awk's inplace edit (see link for discussion on versions):
$ awk -i inplace ' # inplace edit modifies the original files
BEGIN { FS="," } # comma delimited files
NR==FNR { # first file, hash all records and set value to 1
a[$1]++
}
($1 in a) && a[$1]==ARGIND { # if index in a and it has been in each file read
a[$1]++ # ... increase count
print # ... and print to current file
}' file1 file2 file1
Results:
$ cat file1
ID,NAME,ADDRESS
11,PP,LONDON
12,SS,BERLIN
13,QQ,FRANCE
$ cat file2
ID,NAME,ADDRESS
11,PP,LONDON
12,SS,BERLIN
13,QQ,FRANCE
Notice: Inplace edit modifies the original files so beware. If you want to test it without using copies of the original files you could remove the -i inplace
and replace the print
with print FILENAME, $0
Processing the first file each id is hashed to a
and its value increased to 1 and is therefore outputed to file1
leaving it in to its original form (if there are no duplicate ids). For the second file ids found in a
are increased and printed back to file2
and the applies to file1
processed for the second time.
Upvotes: 2
Reputation: 12438
You do not even need to use awk
for this operation grep
is more than enough:
$ more file1 file2
::::::::::::::
file1
::::::::::::::
ID,NAME,ADDRESS
11,PP,LONDON
12,SS,BERLIN
13,QQ,FRANCE
14,LL,JAPAN
::::::::::::::
file2
::::::::::::::
ID,NAME,ADDRESS
11,PP,LONDON
12,SS,BERLIN
13,QQ,FRANCE
16,WW,DUBAI
$ grep -f <(grep -oP '^[^,]*,' file1) file2 > new_file2; grep -f <(grep -oP '^[^,]*,' file2) file1 > new_file1
$ more new_file*
::::::::::::::
new_file1
::::::::::::::
ID,NAME,ADDRESS
11,PP,LONDON
12,SS,BERLIN
13,QQ,FRANCE
::::::::::::::
new_file2
::::::::::::::
ID,NAME,ADDRESS
11,PP,LONDON
12,SS,BERLIN
13,QQ,FRANCE
Explanations:
you use the grep -oP
to extract from each line the id with the comma and you call grep again and pass the list of patterns as if it was a file to analyse the second file this will print only matching lines, you do the same with the other file.
However both files are the same at the end of the process so you do not need to run grep -f <(grep -oP '^[^,]*,' file2) file1 > new_file1
Another way of processing is using the following commands:
$ grep -F -f <(paste -d'\n' <(cut -d',' -f1 file1 | sort -n) <(cut -d',' -f1 file2 | sort -n) | uniq -D | uniq) file1 > new_file1
$ more new_file1
ID,NAME,ADDRESS
11,PP,LONDON
12,SS,BERLIN
13,QQ,FRANCE
Upvotes: 2
Reputation: 392
notice i have removed the !
sign
awk 'FNR==NR{a[$1];next};($1 in a)' File1 File2 | tee File1
you can use the original awk command you have written with a small change
1) remove the !
2) pip the awk result to file using tee command
- using |
executes the next command only when command before the |
have ended
Upvotes: 2