Reputation: 182
How can I remove lines that contain more than 5 "." or less than 5 dots (simply put: 5 dots per line? How can I write a regex that will detect it in bash using grep?
INPUT:
yGEtfWYBCBKtvxTbHxMK,126.221.42.321.0.147.30,10,Bad stuff is happening,http://mystuff.com/file.json
yGEtfWYBCBKtvxTbHxwK,126.221.42.21,10,Bad stuff is happening,http://mystuff.com/file.json
EXPECTED OUTPUT:
yGEtfWYBCBKtvxTbHxwK,176.221.42.21,10,Bad stuff is happening,http://mystuff.com/file.json
Tried:
grep -P '[.]{5}' stuff.txt
grep -P '[\.]{5}' stuff.txt
grep -P '([\.]{5})' stuff.txt
grep -P '\.{5}' stuff.txt
grep -E '([\.]{5}' stuff.txt
Upvotes: 0
Views: 467
Reputation: 8781
Using Perl one-liner to print only if number of "." exceeds 5
> cat five_dots.txt
yGEtfWYBCBKtvxTbHxMK,126.221.42.321.0.147.30,10,Bad stuff is happening,http://mystuff.com/file.json
yGEtfWYBCBKtvxTbHxwK,126.221.42.21,10,Bad stuff is happening,http://mystuff.com/file.json
> perl -ne '{ while(/\./g){$count++} print if $count > 5; $count=0 } ' five_dots.txt
yGEtfWYBCBKtvxTbHxMK,126.221.42.321.0.147.30,10,Bad stuff is happening,http://mystuff.com/file.json
>
Upvotes: -1
Reputation: 5635
To detect specifically the bad IP address
Can you be certain that the IP address is always surrounded by commas and does not contain spaces - i.e. is never the first or last field?
Then, you might get away with:
grep -E ',\w+((\.\w+){2,3}|(\.\w+){5,}),'
If not, it is quite difficult to distinguish between a broken IP form with spaces and an ordinary sentence, so you might have to specify the column.
Upvotes: 0
Reputation: 24812
You can display only the lines that contain exactly 5 dots as follow :
grep '^[^.]*\.[^.]*\.[^.]*\.[^.]*\.[^.]*\.[^.]*$' stuff.txt
or if you want to factor it :
grep -E '^([^.]*\.){5}[^.]*$' stuff.txt
Using -E
RE in this second one is helpful to avoid having to escape the \(\)
and \{\}
, in the first one grep
's default BRE regex flavour is sufficient.
^
and $
are anchors representing respectively the start and end of the line that make sure we match the whole line and not just a part of it that contains 5 dots.
[^.]
is a negated character class that will match anything but a dot.
They are quantified with *
so that any number of non-dot characters can happen between each dot (you might want to change that to +
if consecutive dots shouldn't be matched).
\.
matches a literal dot (rather than any character, which the meta-character .
outside of a character class would).
Upvotes: 2