buydadip
buydadip

Reputation: 9417

Finding lines containing words that occur more than once using grep

How do I find all lines that contain duplicate lower case words. I want to be able to do this using egrep, this is what I've tried thus far but I keep getting invalid back references:

egrep '\<(.)\>\1' inputFile.txt
egrep -w '\b(\w)\b\1' inputFile.txt

For example, if I have the following file:

The sky was grey. 
The fall term went on and on.
I hope every one has a very very happy holiday.
My heart is blue.
I like you too too too much
I love daisies.

It should find the following lines in the file:

The fall term went on and on.
I hope every one has a very very happy holiday.
I like you too too too much

It finds these lines because the words on, very and too occur more than once in each line.

Upvotes: 1

Views: 2946

Answers (4)

Jotne
Jotne

Reputation: 41456

I know this is about grep, but here is an awk
It would be more flexible, since you can easy change to counter c
c==2 two equal words
c>2 two or more equals words
etc

awk -F"[ \t.,]" '{c=0;for (i=1;i<=NF;i++) a[$i]++; for (i in a) c=c<a[i]?a[i]:c;delete a} c==2' file
The fall term went on and on.
I hope every one has a very very happy holiday.

It runs a loop trough all words in a line and create an array index for every words.
Then a new loop to see if there is word that is repeated.

Upvotes: 1

Avinash Raj
Avinash Raj

Reputation: 174706

This could be possible through -E or -P parameter.

grep -E '(\b[a-z]+\b).*\b\1\b' file

Example:

$ cat file
The fall term went on and on.
I hope every one has a very very happy holiday.
Hi foo bar.
$ grep -E '(\b[a-z]+\b).*\b\1\b' file
The fall term went on and on.
I hope every one has a very very happy holiday.

Upvotes: 2

BMW
BMW

Reputation: 45243

Got it, you need find out duplicate words (all lowcase)

sed -n '/\s\([a-z]*\)\s.*\1/p' infile

Tools are used to serve your request. To restrict on one tool is not good way.

\1 is the feature in sed, but not sure if grep/egrep has this feature as well.

Upvotes: 1

repzero
repzero

Reputation: 8412

try

egrep '[a-z]*' my_file

this will find all lower case chars in each line

 egrep '[a-z]*' --color my_file

this will color the lower chars

Upvotes: 0

Related Questions