Reputation: 920
I have 2 files, the first one:
values.txt
test@
test1@
test3@
test4@
test6@
test7@
test8@
test9@
test10@
data.csv
"username","email"
"user","[email protected]"
"user1","[email protected]"
"user2","[email protected]"
"user4","[email protected]"
"user456","[email protected]"
"user789","[email protected]"
"user5","[email protected]"
"user","[email protected]"
"user5","[email protected]"
"user","[email protected]"
I want the output to be like this:
"user","[email protected]"
"user1","[email protected]"
"user2","[email protected]"
"user4","[email protected]"
"user5","[email protected]"
"user5","[email protected]"
What I was able to do :
$ awk -F, -v q='"' 'NR==FNR{a[q $0 q]; next}
$2 in a' values.txt data.csv > test1.csv
This will work only when i have the full "email" exp: [email protected]
and not only test9@
a new file test1.csv containing:
"user5","[email protected]"
....
....
Couldn't figure out how to do it with a partial substring with awk
Upvotes: 1
Views: 521
Reputation: 133780
Could you please try following, written and tested with shown samples in GNU awk
. Looks like few of your lines have empty spaces at last of the lines in case you want to remove them and then match both the file's contents I have added gsub(/ +$/,"")
in my solution.
awk '
{ gsub(/ +$/,"") }
FNR==NR{
arr[$0]
next
}
{
for(key in arr){
if(index($2,key)){
print
next
}
}
}' values.txt FS="," delta.csv
Explanation: Adding detailed explanation for above.
awk ' ##Starting awk program from here.
{ gsub(/ +$/,"") } ##Using gsub to remove spaces at last of lines.
FNR==NR{ ##Checking condition which will be TRUE when values.txt is being read.
arr[$0] ##Creating arr here with index of current line value.
next ##next will skip all further statements from here.
}
{
for(key in arr){ ##Going through arr elements from here.
if(index($2,key)){ ##Checking condition if key is present by index in 2nd field.
print ##Printing the current line.
next ##next will skip all further statements from here.
}
}
}' values.txt FS="," delta.csv ##Mentioning Input_file names here.
Upvotes: 1
Reputation: 786359
You may use this awk
:
awk -F, 'NR==FNR {a[$1]; next} {ea = $2; gsub(/^"|@.*$/, "", ea)} ea "@" in a' values.txt data.csv
"user","[email protected]"
"user1","[email protected]"
"user2","[email protected]"
"user4","[email protected]"
"user5","[email protected]"
"user5","[email protected]"
A more readable version:
awk -F, 'NR == FNR {
a[$1] # from values.txt store each value in array a
next
}
{
ea = $2 # copy $2 into ea (email address)
gsub(/^"|@.*$/, "", ea) # strip starting " and text after @
}
ea "@" in a # check if ea + "@" exists in array a
' values.txt data.csv
Upvotes: 3