Taieb
Taieb

Reputation: 920

Search one file's lines for a partial match in another file

I have 2 files, the first one:

values.txt

test@
test1@
test3@
test4@
test6@
test7@    
test8@
test9@
test10@

data.csv

"username","email"
"user","[email protected]"
"user1","[email protected]"
"user2","[email protected]"
"user4","[email protected]"
"user456","[email protected]"
"user789","[email protected]"
"user5","[email protected]"
"user","[email protected]"
"user5","[email protected]"
"user","[email protected]"

I want the output to be like this:

"user","[email protected]"
"user1","[email protected]"
"user2","[email protected]"
"user4","[email protected]"
"user5","[email protected]"
"user5","[email protected]"

What I was able to do :

$ awk -F, -v q='"' 'NR==FNR{a[q $0 q]; next} 
                    $2 in a' values.txt data.csv > test1.csv

This will work only when i have the full "email" exp: [email protected] and not only test9@ a new file test1.csv containing:

"user5","[email protected]"
 ....
 ....

Couldn't figure out how to do it with a partial substring with awk

Upvotes: 1

Views: 521

Answers (2)

RavinderSingh13
RavinderSingh13

Reputation: 133780

Could you please try following, written and tested with shown samples in GNU awk. Looks like few of your lines have empty spaces at last of the lines in case you want to remove them and then match both the file's contents I have added gsub(/ +$/,"") in my solution.

awk '
{ gsub(/ +$/,"") }
FNR==NR{
  arr[$0]
  next
}
{
  for(key in arr){
    if(index($2,key)){
      print
      next
    }
  }
}' values.txt FS="," delta.csv

Explanation: Adding detailed explanation for above.

awk '                               ##Starting awk program from here.
{ gsub(/ +$/,"") }                  ##Using gsub to remove spaces at last of lines.
FNR==NR{                            ##Checking condition which will be TRUE when values.txt is being read.
  arr[$0]                           ##Creating arr here with index of current line value.
  next                              ##next will skip all further statements from here.
}
{
  for(key in arr){                  ##Going through arr elements from here.
    if(index($2,key)){              ##Checking condition if key is present by index in 2nd field.
      print                         ##Printing the current line.
      next                          ##next will skip all further statements from here.
    }
  }
}' values.txt FS="," delta.csv      ##Mentioning Input_file names here.

Upvotes: 1

anubhava
anubhava

Reputation: 786359

You may use this awk:

awk -F, 'NR==FNR {a[$1]; next} {ea = $2; gsub(/^"|@.*$/, "", ea)} ea "@" in a' values.txt data.csv

"user","[email protected]"
"user1","[email protected]"
"user2","[email protected]"
"user4","[email protected]"
"user5","[email protected]"
"user5","[email protected]"

A more readable version:

awk -F, 'NR == FNR {
   a[$1]                   # from values.txt store each value in array a
   next
}
{
   ea = $2                 # copy $2 into ea (email address)
   gsub(/^"|@.*$/, "", ea) # strip starting " and text after @
}
ea "@" in a                # check if ea + "@" exists in array a
' values.txt data.csv

Upvotes: 3

Related Questions