Reputation: 91
I have been looking at several other answers and couldn't find what I want.
I have a big file with some urls in it and I am looking for urls which have the pattern tt in them. Of course every line has http in it. so if I do
grep tt myfile | wc -l
I get all the lines of the file. How can I find patterns which match tt, without matching http?
I tried with --exclude and it doesn't work, I think exclude only works on the path, right?
I could use sed and replace http by something else and then grep normally, but how elegant is that? there must be another way...
Upvotes: 0
Views: 1019
Reputation: 9
you can use grep -v to exclude the lines with a pattern like this
grep tt myfile | grep -v http | wc -l
This give, first of all, the lines with "tt", then exclude those with "http" and then count it.
Upvotes: 0
Reputation: 10039
egrep -c 'http://[^ ?]*tt' YourFile
grep -E
also) pattern that allow to exclude the http part of the search criteriaUpvotes: 0
Reputation: 76636
You can use the -P
switch to have grep
interpret the pattern as a Perl regular expression. Then you can use lookaround assertions to match tt
s that are not preceded by h
and not followed by p://
.
grep -iP '(?<!h)tt(?!ps?://)' myfile | wc -l
Upvotes: 2
Reputation: 63972
Having the next test file
some text http://example.com/redirect?http://some/test.html #not wanted
some text http://example.com/notete.html #not wanted
some text http://example.com/redirect?http://some/anyttany.html #wanted
some text http://example.com/http.html #wanted
some text http://example.com/tt.html #wanted
some text http://example.com/somett.html #wanted
some text http://example.com/somettsome.html #wanted
some text /example.com/somettsome.html #wanted (path only)
the next:
grep -P 'http://\S*tt(?!p:)' file
prints
some text http://example.com/redirect?http://some/anyttany.html #wanted
some text http://example.com/http.html #wanted
some text http://example.com/tt.html #wanted
some text http://example.com/somett.html #wanted
some text http://example.com/somettsome.html #wanted
mean
http:// 'http://'
----------------------------------------------------------------------
\S* non-whitespace (all but \n, \r, \t, \f,
and " ") (0 or more times (matching the
most amount possible))
----------------------------------------------------------------------
tt 'tt'
----------------------------------------------------------------------
(?! look ahead to see if there is not:
----------------------------------------------------------------------
p: 'p:'
----------------------------------------------------------------------
) end of look-ahead
----------------------------------------------------------------------
and the
grep -cP 'http://\S*tt(?!p:)' file
will count the matched lines
if the http://
at the start is optional,
grep -P '(<=http://)?\S*tt(?!p:)' file
will do the same job and for the same input prints
some text http://example.com/redirect?http://some/anyttany.html #wanted
some text http://example.com/http.html #wanted
some text http://example.com/tt.html #wanted
some text http://example.com/somett.html #wanted
some text http://example.com/somettsome.html #wanted
some text /example.com/somettsome.html #wanted (path only)
for capturing the URL's (and paths)
grep -oP '.*?\K(http:/)?/\S*tt(?!p:)\S*' file
prints
http://example.com/redirect?http://some/anyttany.html
http://example.com/http.html
http://example.com/tt.html
http://example.com/somett.html
http://example.com/somettsome.html
/example.com/somettsome.html
capturing only http://
grep -oP '.*?\Khttp://\S*tt(?!p:)\S*' file
http://example.com/redirect?http://some/anyttany.html
http://example.com/http.html
http://example.com/tt.html
http://example.com/somett.html
http://example.com/somettsome.html
Upvotes: 1
Reputation: 41460
You can use awk
like this
cat file:
http://example.com
http://google.com
my.tt.com
t.foo.bar
http://foobar.com
http://example.com/somett.html
http://example.com/http.html
http://example.com/notete.html
http://example.com/tt.html
http://example.com/somett.html
http://example.com/somettsome.html
awk -F"http:" '$NF~/tt/'
my.tt.com
http://example.com/somett.html
http://example.com/http.html
http://example.com/tt.html
http://example.com/somett.html
http://example.com/somettsome.html
Upvotes: 0