Reputation: 9

How to use grep or awk to process a specific column ( with keywords from text file )

I've tried many combinations of grep and awk commands to process text from file.

This is a list of customers of this type:

John,Mills,81,Crescent,New York,NY,[email protected],19/02/1954

I am trying to separate these records into two categories, MEN and FEMALES.

I have a list of some 5000 Female Names , all in plain text , all in one file.

How can I "grep" the first column ( since I am only matching first names) but still printing the entire customer record ?

I found it easy to "cut" the first column and grep --file=female.names.txt, but this way it's not going to print the entire record any longer.

I am aware of the awk option but in that case I don't know how to read the female names from file.

awk -F ',' ' { if($1==" ???Filename??? ") print $0} '

Many thanks !

Upvotes: 1

Answers (4)

Ishrak

Reputation: 547

So, I've come up with the following:

Suppose, you have a file having the following lines in a file named test.txt:

abe 123 bdb 532

xyz 593 iau 591

Now you want to find the lines which include the first field having the first and last letters as vowels. If you did a simple grep you would get both of the lines but the following will give you the first line only which is the desired output:

egrep "^([0-z]{1,} ){0}[aeiou][0-z]+[aeiou]" test.txt

Then you want to the find the lines which include the third field having the first and last letters as vowels. Similary, if you did a simple grep you would get both of the lines but the following will give you the second line only which is the desired output:

egrep "^([0-z]{1,} ){2}[aeiou][0-z]+[aeiou]" test.txt

The value in the first curly braces {1,} specifies that the preceding character which ranges from 0 to z according to the ASCII table, can occur any number of times. After that, we have the field separator space in this case. Change the value within the second curly braces {0} or {2} to the desired field number-1. Then, use a regular expression to mention your criteria.

Upvotes: 0

C. K. Young

Reputation: 223023

Another alternative is Perl, which can be useful if you're not super-familiar with awk.

#!/usr/bin/perl -anF,
use strict;
our %names;

BEGIN {
    while (<ARGV>) {
        chomp;
        $names{$_} = 1;
    }
}

print if $names{$F[0]};

To run (assume you named this file filter.pl):

perl filter.pl female.names.txt < records.txt

Upvotes: 0

John B

Reputation: 3646

You can do this with Awk:

awk -F, 'NR==FNR{a[$0]; next} ($1 in a)' female.names.txt file.csv

Would print the lines of your csv file that contain first names of any found in your file female.names.txt.

awk -F, 'NR==FNR{a[$0]; next} !($1 in a)' female.names.txt file.csv

Would output lines not found in female.names.txt.

This assumes the format of your female.names.txt file is something like:

Heather
Irene
Jane

Upvotes: 4

Barmar

Reputation: 781028

Try this:

grep --file=<(sed 's/.*/^&,/' female.names.txt) datafile.csv

This changes all the names in the list of female names to the regular expression ^name, so it only matches at the beginning of the line and followed by a comma. Then it uses process substitution to use that as the file to match against the data file.

Upvotes: 0

How to use grep or awk to process a specific column ( with keywords from text file )

Answers (4)

Related Questions