Reputation: 111
I'm trying to write a Bash script that will read a text file and pull out every word that is followed by a comma on a new line. I tried using grep
but it prints the whole line that has a comma in it, and I have had the same trouble with awk
. I have checked the manage for grep, but it seems to be more focused on flags than syntax. This is a normal .txt file, not a CSV there are just commas as appropriate for normal english grammar. Can anyone show me how to set up this script?
For example if the text file contained a list of animals, like so:
The Veterinary clinic treats the following animals: dogs, cats, and birds
the script would display:
dogs,
cats,
Upvotes: 0
Views: 145
Reputation: 23794
And one more with sed
:
#! /bin/sh
sed '
s/[^,]* //g
s/,[^,]*$/,/
s/,\(.\)/,\
\1/g
' <<EOF
The Veterinary clinic treats the following animals: dogs, cats, and birds
EOF
How it works:
Upvotes: 0
Reputation: 74595
If your version of grep supports the -o
switch then you could use that, otherwise this should work in most versions of awk:
awk '{ for (i = 1; i <= NF; ++i) if ($i ~ /^[[:alpha:]]+,$/) print $i }' file
Loop through all the fields in the file and print those that consist of only alphabet characters followed by a comma.
If you have GNU awk, then you can simplify the approach by setting RS
to any number of space characters:
awk -v RS='\\s+' '/^[[:alpha:]]+,$/' file
Upvotes: 1
Reputation: 720
Looks like you need to know about grep
's -o
parameter (only matching). If you consider a word to be a series of letters separated by spaces than this match will do:
grep -o "[a-zA-Z]\+," file
Upvotes: 1