Reputation: 111

Writing a Bash script pulling every word that is separated by a comma out of a text file

I'm trying to write a Bash script that will read a text file and pull out every word that is followed by a comma on a new line. I tried using grep but it prints the whole line that has a comma in it, and I have had the same trouble with awk. I have checked the manage for grep, but it seems to be more focused on flags than syntax. This is a normal .txt file, not a CSV there are just commas as appropriate for normal english grammar. Can anyone show me how to set up this script?

For example if the text file contained a list of animals, like so:

The Veterinary clinic treats the following animals: dogs, cats, and birds

the script would display:

dogs,
cats,

Upvotes: 0

Answers (3)

ceving

Reputation: 23794

And one more with sed:

#! /bin/sh
sed '
s/[^,]* //g
s/,[^,]*$/,/
s/,\(.\)/,\
\1/g
' <<EOF
The Veterinary clinic treats the following animals: dogs, cats, and birds
EOF

How it works:

Remove everything not containing a comma but followed by a space.
Remove everything not containing a comma following a comma at the end of the line.
Replace every comma not being at the end of line with a comma and a new-line.

Upvotes: 0

Tom Fenech

Reputation: 74595

If your version of grep supports the -o switch then you could use that, otherwise this should work in most versions of awk:

awk '{ for (i = 1; i <= NF; ++i) if ($i ~ /^[[:alpha:]]+,$/) print $i }' file

Loop through all the fields in the file and print those that consist of only alphabet characters followed by a comma.

If you have GNU awk, then you can simplify the approach by setting RS to any number of space characters:

awk -v RS='\\s+' '/^[[:alpha:]]+,$/' file

Upvotes: 1

fanton

Reputation: 720

Looks like you need to know about grep's -o parameter (only matching). If you consider a word to be a series of letters separated by spaces than this match will do:

grep -o "[a-zA-Z]\+," file

Upvotes: 1

Writing a Bash script pulling every word that is separated by a comma out of a text file

Answers (3)

Related Questions