Reputation: 65
Given a text file such as this, say phrases.txt
with contents:
Hahahahahasdhfjshfjshdhfjhdf
Hahahaha!
jdsahjhshfjhfHahahaha!dhsjfhajhfjhf
Hahaha!Hahaha!
dfhjfsf
sdfjsjf Hahaha! djfhjsdfh
Ha! hdfshdfs
Ha! Ha! Ha!
What would be an appropriate grep
command in bash that would output only the lines that contain only a single occurrence of laughter, where laughter is defined as a string
of the form Hahahahaha!
with arbitrarily many ha
s. The first H
is always capital and the other ones are not, and the string must end in !
. In my example, the egrep command should output:
Hahahaha!
jdsahjhshfjhfHahahaha!dhsjfhajhfjhf
sdfjsjf Hahaha! djfhjsdfh
Ha! hdfshdfs
A command I came up with was:
egrep "(Ha(ha)*\!){1}" phrases.txt
The issue with my command is that it does not only output the lines with only a single occurrence of laughter. With my command, line 4 (Hahaha!Hahaha!
) and line 8 (Ha! Ha! Ha!
) also get printed which is not what I want.
Is there a nice way to do this with only grep?
Upvotes: 3
Views: 1976
Reputation: 626738
If you use a GNU grep
or pcregrep
that support PCRE regex, you may use
grep -P '^(?!(?:.*Ha(ha)*!){2}).*Ha(ha)*!'
The pattern is:
^(?!(?:.*YOUR_PATTERN_HERE){2}).*YOUR_PATTERN_HERE
where YOUR_PATTERN_HERE
stands for your pattern you want to occur only once in the string.
Details
^
- start of a strig(?!(?:.*YOUR_PATTERN_HERE){2})
- a negative lookahead that fails the match, immediately to the right of the current location (here, the start of string), there are two consecutive occurrences of
.*
- any 0+ chars other than line break charsYOUR_PATTERN_HERE
- your required pattern.*
- any 0+ chars other than line break charsYOUR_PATTERN_HERE
- your required pattern.See the online demo:
s="Hahahahahasdhfjshfjshdhfjhdf
Hahahaha!
jdsahjhshfjhfHahahaha!dhsjfhajhfjhf
Hahaha!Hahaha!
dfhjfsf
sdfjsjf Hahaha! djfhjsdfh
Ha! hdfshdfs
Ha! Ha! Ha!"
echo "$s" | grep -P '^(?!(?:.*Ha(ha)*!){2}).*Ha(ha)*!'
Output:
Hahahaha!
jdsahjhshfjhfHahahaha!dhsjfhajhfjhf
sdfjsjf Hahaha! djfhjsdfh
Ha! hdfshdfs
Upvotes: 0
Reputation: 742
you are okay with pipes then
egrep '(Ha(ha)*!)' yourfile.txt | egrep -v '(Ha(ha)*!).*(Ha(ha)*!)'
first filter for at least one laugh, then filter out the ones that have more than one laugh.
Note: {1}
only repeats the previous chunk, it doesn't check the rest of the string to insist that there is only one. So a{1}
and a
are actually the same.
Upvotes: 2