Priyanka
Priyanka

Reputation: 307

Delete rows from 2 files after matching on first column

I've two files file1 & file2 and i want to delete unmatched rows based on first column i.e. ID

File1

ID,NAME,ADDRESS
11,PP,LONDON
12,SS,BERLIN
13,QQ,FRANCE
14,LL,JAPAN

File2

ID,NAME,ADDRESS
11,PP,LONDON
12,SS,BERLIN
13,QQ,FRANCE
16,WW,DUBAI

I want to delete lines

From File1
14,LL,JAPAN 

From File2
16,WW,DUBAI

Using DIFF command i am able to file but i want to delete these lines.

Also Using AWK i am able to redirect these lines to another file using below command.

awk 'FNR==NR{a[$1];next};!($1 in a)' File1 File2 > File3

But don't know how to delete.

Can we do this ?

Using sed -id 's/AWK OUTPUT/g' File1

Thanks

Upvotes: 1

Views: 640

Answers (3)

James Brown
James Brown

Reputation: 37394

One using GNU awk's inplace edit (see link for discussion on versions):

$ awk -i inplace '            # inplace edit modifies the original files
BEGIN { FS="," }              # comma delimited files
NR==FNR {                     # first file, hash all records and set value to 1
    a[$1]++
}
($1 in a) && a[$1]==ARGIND {  # if index in a and it has been in each file read
    a[$1]++                   # ... increase count 
    print                     # ... and print to current file
}' file1 file2 file1

Results:

$ cat file1
ID,NAME,ADDRESS
11,PP,LONDON
12,SS,BERLIN
13,QQ,FRANCE
$ cat file2
ID,NAME,ADDRESS
11,PP,LONDON
12,SS,BERLIN
13,QQ,FRANCE

Notice: Inplace edit modifies the original files so beware. If you want to test it without using copies of the original files you could remove the -i inplace and replace the print with print FILENAME, $0

Processing the first file each id is hashed to a and its value increased to 1 and is therefore outputed to file1 leaving it in to its original form (if there are no duplicate ids). For the second file ids found in a are increased and printed back to file2 and the applies to file1 processed for the second time.

Upvotes: 2

Allan
Allan

Reputation: 12438

You do not even need to use awk for this operation grep is more than enough:

$ more file1 file2
::::::::::::::
file1
::::::::::::::
ID,NAME,ADDRESS
11,PP,LONDON
12,SS,BERLIN
13,QQ,FRANCE
14,LL,JAPAN
::::::::::::::
file2
::::::::::::::
ID,NAME,ADDRESS
11,PP,LONDON
12,SS,BERLIN
13,QQ,FRANCE
16,WW,DUBAI

$ grep -f <(grep -oP '^[^,]*,' file1) file2 > new_file2; grep -f <(grep -oP '^[^,]*,' file2) file1 > new_file1

$ more new_file*
::::::::::::::
new_file1
::::::::::::::
ID,NAME,ADDRESS
11,PP,LONDON
12,SS,BERLIN
13,QQ,FRANCE
::::::::::::::
new_file2
::::::::::::::
ID,NAME,ADDRESS
11,PP,LONDON
12,SS,BERLIN
13,QQ,FRANCE

Explanations:

you use the grep -oP to extract from each line the id with the comma and you call grep again and pass the list of patterns as if it was a file to analyse the second file this will print only matching lines, you do the same with the other file.

However both files are the same at the end of the process so you do not need to run grep -f <(grep -oP '^[^,]*,' file2) file1 > new_file1

Another way of processing is using the following commands:

$ grep -F -f <(paste -d'\n' <(cut -d',' -f1 file1 | sort -n) <(cut -d',' -f1 file2 | sort -n) | uniq -D | uniq) file1 > new_file1

$ more new_file1
ID,NAME,ADDRESS
11,PP,LONDON
12,SS,BERLIN
13,QQ,FRANCE

Upvotes: 2

shaiki siegal
shaiki siegal

Reputation: 392

Answer using tee command:

notice i have removed the ! sign

 awk 'FNR==NR{a[$1];next};($1 in a)' File1 File2 | tee File1 

you can use the original awk command you have written with a small change

1) remove the !

2) pip the awk result to file using tee command - using | executes the next command only when command before the | have ended

Upvotes: 2

Related Questions