Reputation: 9
I've tried many combinations of grep and awk commands to process text from file.
This is a list of customers of this type:
John,Mills,81,Crescent,New York,NY,[email protected],19/02/1954
I am trying to separate these records into two categories, MEN and FEMALES.
I have a list of some 5000 Female Names , all in plain text , all in one file.
How can I "grep" the first column ( since I am only matching first names) but still printing the entire customer record ?
I found it easy to "cut" the first column and grep --file=female.names.txt
, but this way it's not going to print the entire record any longer.
I am aware of the awk option but in that case I don't know how to read the female names from file.
awk -F ',' ' { if($1==" ???Filename??? ") print $0} '
Many thanks !
Upvotes: 1
Views: 1532
Reputation: 547
So, I've come up with the following:
Suppose, you have a file having the following lines in a file named test.txt
:
abe 123 bdb 532
xyz 593 iau 591
Now you want to find the lines which include the first field having the first and last letters as vowels. If you did a simple grep
you would get both of the lines but the following will give you the first line only which is the desired output:
egrep "^([0-z]{1,} ){0}[aeiou][0-z]+[aeiou]" test.txt
Then you want to the find the lines which include the third field having the first and last letters as vowels. Similary, if you did a simple grep
you would get both of the lines but the following will give you the second line only which is the desired output:
egrep "^([0-z]{1,} ){2}[aeiou][0-z]+[aeiou]" test.txt
The value in the first curly braces {1,}
specifies that the preceding character which ranges from 0 to z according to the ASCII table, can occur any number of times. After that, we have the field separator space in this case
. Change the value within the second curly braces {0} or {2}
to the desired field number-1
. Then, use a regular expression to mention your criteria.
Upvotes: 0
Reputation: 223023
Another alternative is Perl, which can be useful if you're not super-familiar with awk.
#!/usr/bin/perl -anF,
use strict;
our %names;
BEGIN {
while (<ARGV>) {
chomp;
$names{$_} = 1;
}
}
print if $names{$F[0]};
To run (assume you named this file filter.pl
):
perl filter.pl female.names.txt < records.txt
Upvotes: 0
Reputation: 3646
You can do this with Awk:
awk -F, 'NR==FNR{a[$0]; next} ($1 in a)' female.names.txt file.csv
Would print the lines of your csv file that contain first names of any found in your file female.names.txt
.
awk -F, 'NR==FNR{a[$0]; next} !($1 in a)' female.names.txt file.csv
Would output lines not found in female.names.txt
.
This assumes the format of your female.names.txt
file is something like:
Heather
Irene
Jane
Upvotes: 4
Reputation: 781028
Try this:
grep --file=<(sed 's/.*/^&,/' female.names.txt) datafile.csv
This changes all the names in the list of female names to the regular expression ^name,
so it only matches at the beginning of the line and followed by a comma. Then it uses process substitution to use that as the file to match against the data file.
Upvotes: 0