Angelo
Angelo

Reputation: 5059

not equal to operator with awk

I am not sure what wrong I am doing but I am certainly making some mistake with my awk command.

I have two files, fileA contains names

FileA

Abhi
Roma
GiGi
KaKa

FileB contains other data with names

Abhi 23  Pk
DaDa 43  Gk
Roma 33  Kk
PkPk 22  Aa

Now, I trying to print the details of all the names that are absent in fileA.

for i in `cat FileA` ; do cat FileB | awk '{ if ($1!='$i') print $0_}'>> Result; done

What I get is

Abhi    23  Pk
DaDa    43  Gk
Roma    33  Kk
PkPk    22  Aa
Abhi    23  Pk
DaDa    43  Gk
Roma    33  Kk
PkPk    22  Aa
Abhi    23  Pk
DaDa    43  Gk

Desired output

DaDa 43  Gk
PkPk 22  Aa

Could anyone help me in finding out the error.

Thank you

Upvotes: 3

Views: 63138

Answers (4)

RARE Kpop Manifesto
RARE Kpop Manifesto

Reputation: 2865

mawk 'NR==FNR ? __[$_] : $!_ in __==_' <( printf '%s' "$test1" )
                                       <( printf '%s' "$test2" )
DaDa 43  Gk
PkPk 22  Aa

or make it without the ternary operator :

gawk '$!_ in __ != (FNR < NR || __[$_])' 
DaDa 43  Gk
PkPk 22  Aa

Upvotes: 0

Fedor
Fedor

Reputation: 86

this task looks like classical Two-file processing pattern:

# prints lines that are not both in fileA & fileB (inv intersection)
$ awk 'NR == FNR{a[$1];next} !($1 in a) ' fileA fileB

so here:

  • NR==FNR is True only when reading 1st file
  • a[$1] - create element with 1st column from fileA as key a[$0] is same in this example, as $0==$1 one could write ++a[$1] to count duplicates if needed same time. or a[$1]=$2 to store some extra info
  • next - stops further processing, while reading 1st file, e.g. FileA
  • !($1 in a) - this part will start being executed while reading FileB and it will print only lines from it when a[$1] exists, e.g. there is element with key equal to $1. Note, its equivalent to !($1 in a) {print $0}, so printing format could be modified if desired...

Upvotes: 1

AwkMan
AwkMan

Reputation: 670

The problem is that when you want to compare with a string, that string must be between quotes, otherwise, it assumes that the string is a variable name.

For example:

awk '{ if ($1!=name) print $0_}'

In this case, awk will assume that "name" is a variable, which will be empty, as no value has been assigned to it, and hence, compare $1 with an empty string.

awk '{ if ($1!="name") print $0_}'

In this case, awk will compare $1 with the string "name".

Therefore, the correct code for you is:

for i in `cat FileA` ; do cat FileB | awk -v var="$i" '{ if ($1!=var) print $0_}'>> Result; done

This will also work, though I think it is clearer in the previous way:

for i in `cat FileA` ; do cat FileB | awk '{ if ($1!="'$i'") print $0_}'>> Result; done

EDIT: Check fedorqui answer for a better approach in the solution

Upvotes: 3

fedorqui
fedorqui

Reputation: 290025

For this you just need grep:

$ grep -vf fileA fileB
DaDa 43  Gk
PkPk 22  Aa

This uses fileA to obtain the patterns from. Then, -v inverts the match.

AwkMan addresses very well why you are not matching lines properly. Now, let's see where your solution needs polishing:

Your code is:

for i in `cat FileA`
do
    cat FileB | awk '{ if ($1!='$i') print $0_}'>> Result
done

Why you don't read lines with "for" explains it well. So you would need to say something like the described in Read a file line by line assigning the value to a variable:

while IFS= read -r line
do
    cat FileB | awk '{ if ($1!='$i') print $0_}'>> Result
done < fileA

Then, you are saying cat file | awk '...'. For this, awk '...' file is enough:

while IFS= read -r line
do
    awk '{ if ($1!='$i') print $0_}' FileB >> Result
done < fileA

Also, the redirection could be done at the end of the done, so you have a clearer command:

while IFS= read -r line
do
    awk '{ if ($1!='$i') print $0_}' FileB
done < fileA >> Result

Calling awk so many times is not useful and you can use the FNR==NR trick to process two files together.

Let's now enter in awk. Here you want to use some kind of variable to compare results. However, $i is nothing to awk.

Also, when you have a sentence like:

awk '{if (condition) print $0}' file

It is the same to say:

awk 'condition' file

Because {print $0} is the default action to perform when a condition evaluates to true.

Also, to let awk use a bash variable you need to use awk -v var="$shell_var" and then use var internally-

All together, you should say something like:

while IFS= read -r line
do
    awk -v var="$line" '$1 != var' FileB
done < fileA >> Result

But since you are looping through the file many times, it will print the lines many, many times. That's why you have to go all the way up to this answer and use grep -vf fileA fileB.

Upvotes: 10

Related Questions