JohnnyKing94
JohnnyKing94

Reputation: 61

Awk and sed command on the same time

I have 2 files, the English file (source file) and the Italian file (target file). Both of them have the same number of lines. I run awk 'NF<3' to remove all the strings in my Italian file having more than 2 words, but at the same time I'd like to remove the specific source strings erased from the Italian file in the English file (I thought I could work on the line number). Naturally, I have to perform a sed command on the line number of the source string (cause the strings in both file are different), but I do not know how to do that at the same time while I'm using awk to remove those strings from the Italian file, because when I launched the command, I lose the equivalent line numbers in the files.

Example

EN
1 Santa Claus
2 Pigs don't fly
3 The son of the father
4 Elf

IT
1 Babbo Natale
2 I maiali non volano
3 Il figlio del padre
4 Elfo

I run awk on IT file
OUTPUT FILE
IT
1 Babbo Natale
4 Elfo

the lines removed with awk in the IT file need to be also removed from the EN file (i can't use again awk on the eng file, cause the word count on the eng file is different with the IT file, it's only a line number work)

THE OUTPUT EN FILE MUST BE
1 Santa Claus
2 Elf

Any suggestions? If it's not clear, please ask...

Upvotes: 1

Views: 775

Answers (2)

potong
potong

Reputation: 58578

This might work for you (GNU sed):

sed -rn 's/\S+//3;T;=' fileIT | sed 's/.*/&d/' | sed -f - fileEN

This uses the IT file to create a sed file that is run against the EN file. The first sed invocation ouputs a line number of any line in the IT file that has three or more words on a line. The second sed invocation turns the line number into a sed command to delete that line number. The third sed invocation deletes those line numbers from the EN file.

Upvotes: 0

Juan Diego Godoy Robles
Juan Diego Godoy Robles

Reputation: 14975

Having as source files:

$ cat it.dat 
Babbo Natale
I maiali non volano
Il figlio del padre
Elfo

$ cat en.dat 
Santa Claus
Pigs don't fly
The son of the father
Elf

This awk:

awk 'NR==FNR{if(NF>3){a[NR]}else{a[NR]=1;print > "filtered_it.dat"}}
     NR!=FNR && a[FNR]{print > "filtered_en.dat"}' it.dat en.dat 

Results

$ cat filtered_id.dat 
Babbo Natale
Elfo
$ cat filtered_en.dat 
Santa Claus
Elf

Upvotes: 4

Related Questions