Reputation: 9417
How do I find all lines that contain duplicate lower case words.
I want to be able to do this using egrep
, this is what I've tried thus far but I keep getting invalid back references:
egrep '\<(.)\>\1' inputFile.txt
egrep -w '\b(\w)\b\1' inputFile.txt
For example, if I have the following file:
The sky was grey.
The fall term went on and on.
I hope every one has a very very happy holiday.
My heart is blue.
I like you too too too much
I love daisies.
It should find the following lines in the file:
The fall term went on and on.
I hope every one has a very very happy holiday.
I like you too too too much
It finds these lines because the words on
, very
and too
occur more than once in each line.
Upvotes: 1
Views: 2946
Reputation: 41456
I know this is about grep
, but here is an awk
It would be more flexible, since you can easy change to counter c
c==2
two equal words
c>2
two or more equals words
etc
awk -F"[ \t.,]" '{c=0;for (i=1;i<=NF;i++) a[$i]++; for (i in a) c=c<a[i]?a[i]:c;delete a} c==2' file
The fall term went on and on.
I hope every one has a very very happy holiday.
It runs a loop trough all words in a line and create an array index for every words.
Then a new loop to see if there is word that is repeated.
Upvotes: 1
Reputation: 174706
This could be possible through -E
or -P
parameter.
grep -E '(\b[a-z]+\b).*\b\1\b' file
Example:
$ cat file
The fall term went on and on.
I hope every one has a very very happy holiday.
Hi foo bar.
$ grep -E '(\b[a-z]+\b).*\b\1\b' file
The fall term went on and on.
I hope every one has a very very happy holiday.
Upvotes: 2
Reputation: 45243
Got it, you need find out duplicate words (all lowcase)
sed -n '/\s\([a-z]*\)\s.*\1/p' infile
\1
is the feature in sed, but not sure if grep/egrep has this feature as well.
Upvotes: 1
Reputation: 8412
try
egrep '[a-z]*' my_file
this will find all lower case chars in each line
egrep '[a-z]*' --color my_file
this will color the lower chars
Upvotes: 0