creed
creed

Reputation: 182

Removing number of dots with grep using regular expression

How can I remove lines that contain more than 5 "." or less than 5 dots (simply put: 5 dots per line? How can I write a regex that will detect it in bash using grep?

INPUT:

yGEtfWYBCBKtvxTbHxMK,126.221.42.321.0.147.30,10,Bad stuff is happening,http://mystuff.com/file.json
yGEtfWYBCBKtvxTbHxwK,126.221.42.21,10,Bad stuff is happening,http://mystuff.com/file.json

EXPECTED OUTPUT:

yGEtfWYBCBKtvxTbHxwK,176.221.42.21,10,Bad stuff is happening,http://mystuff.com/file.json

Tried:

grep -P '[.]{5}' stuff.txt
grep -P '[\.]{5}' stuff.txt
grep -P '([\.]{5})' stuff.txt
grep -P '\.{5}' stuff.txt
grep -E '([\.]{5}' stuff.txt

Upvotes: 0

Views: 467

Answers (3)

stack0114106
stack0114106

Reputation: 8781

Using Perl one-liner to print only if number of "." exceeds 5

> cat five_dots.txt
yGEtfWYBCBKtvxTbHxMK,126.221.42.321.0.147.30,10,Bad stuff is happening,http://mystuff.com/file.json
yGEtfWYBCBKtvxTbHxwK,126.221.42.21,10,Bad stuff is happening,http://mystuff.com/file.json
> perl -ne '{ while(/\./g){$count++} print if $count > 5; $count=0 } ' five_dots.txt
yGEtfWYBCBKtvxTbHxMK,126.221.42.321.0.147.30,10,Bad stuff is happening,http://mystuff.com/file.json
> 

Upvotes: -1

Gem Taylor
Gem Taylor

Reputation: 5635

To detect specifically the bad IP address

Can you be certain that the IP address is always surrounded by commas and does not contain spaces - i.e. is never the first or last field?

Then, you might get away with:

grep -E ',\w+((\.\w+){2,3}|(\.\w+){5,}),'

If not, it is quite difficult to distinguish between a broken IP form with spaces and an ordinary sentence, so you might have to specify the column.

Upvotes: 0

Aaron
Aaron

Reputation: 24812

You can display only the lines that contain exactly 5 dots as follow :

grep '^[^.]*\.[^.]*\.[^.]*\.[^.]*\.[^.]*\.[^.]*$' stuff.txt

or if you want to factor it :

grep -E '^([^.]*\.){5}[^.]*$' stuff.txt

Using -ERE in this second one is helpful to avoid having to escape the \(\) and \{\}, in the first one grep's default BRE regex flavour is sufficient.

^ and $ are anchors representing respectively the start and end of the line that make sure we match the whole line and not just a part of it that contains 5 dots.

[^.] is a negated character class that will match anything but a dot.
They are quantified with * so that any number of non-dot characters can happen between each dot (you might want to change that to + if consecutive dots shouldn't be matched).

\. matches a literal dot (rather than any character, which the meta-character . outside of a character class would).

Upvotes: 2

Related Questions