momeunier
momeunier

Reputation: 91

How to search for a pattern using grep and exclude another pattern

I have been looking at several other answers and couldn't find what I want.

I have a big file with some urls in it and I am looking for urls which have the pattern tt in them. Of course every line has http in it. so if I do

grep tt myfile | wc -l

I get all the lines of the file. How can I find patterns which match tt, without matching http?

I tried with --exclude and it doesn't work, I think exclude only works on the path, right?

I could use sed and replace http by something else and then grep normally, but how elegant is that? there must be another way...

Upvotes: 0

Views: 1019

Answers (5)

JDaRiva
JDaRiva

Reputation: 9

you can use grep -v to exclude the lines with a pattern like this

grep tt myfile | grep -v http | wc -l

This give, first of all, the lines with "tt", then exclude those with "http" and then count it.

Upvotes: 0

NeronLeVelu
NeronLeVelu

Reputation: 10039

egrep -c 'http://[^ ?]*tt' YourFile
  • -c four count
  • egrep for regex (you can use grep -E also) pattern that allow to exclude the http part of the search criteria
  • add and exclusion of space/special url char (suggestion from comment of Jotne and following) to avoid taking tt from a eventual second url on the same line.

Upvotes: 0

Amal
Amal

Reputation: 76636

You can use the -P switch to have grep interpret the pattern as a Perl regular expression. Then you can use lookaround assertions to match tts that are not preceded by h and not followed by p://.

grep -iP '(?<!h)tt(?!ps?://)' myfile | wc -l

Upvotes: 2

clt60
clt60

Reputation: 63972

Having the next test file

some text http://example.com/redirect?http://some/test.html             #not wanted
some text http://example.com/notete.html                                #not wanted
some text http://example.com/redirect?http://some/anyttany.html         #wanted
some text http://example.com/http.html                                  #wanted
some text http://example.com/tt.html                                    #wanted
some text http://example.com/somett.html                                #wanted
some text http://example.com/somettsome.html                            #wanted
some text /example.com/somettsome.html                                  #wanted (path only)

the next:

grep -P 'http://\S*tt(?!p:)' file

prints

some text http://example.com/redirect?http://some/anyttany.html         #wanted
some text http://example.com/http.html                                  #wanted
some text http://example.com/tt.html                                    #wanted
some text http://example.com/somett.html                                #wanted
some text http://example.com/somettsome.html                            #wanted

mean

  http://                  'http://'
----------------------------------------------------------------------
  \S*                      non-whitespace (all but \n, \r, \t, \f,
                           and " ") (0 or more times (matching the
                           most amount possible))
----------------------------------------------------------------------
  tt                       'tt'
----------------------------------------------------------------------
  (?!                      look ahead to see if there is not:
----------------------------------------------------------------------
    p:                       'p:'
----------------------------------------------------------------------
  )                        end of look-ahead
----------------------------------------------------------------------

and the

grep -cP 'http://\S*tt(?!p:)' file

will count the matched lines

if the http:// at the start is optional,

 grep -P '(<=http://)?\S*tt(?!p:)' file

will do the same job and for the same input prints

some text http://example.com/redirect?http://some/anyttany.html         #wanted
some text http://example.com/http.html                                  #wanted
some text http://example.com/tt.html                                    #wanted
some text http://example.com/somett.html                                #wanted
some text http://example.com/somettsome.html                            #wanted
some text /example.com/somettsome.html                                  #wanted (path only)

for capturing the URL's (and paths)

grep -oP '.*?\K(http:/)?/\S*tt(?!p:)\S*' file

prints

http://example.com/redirect?http://some/anyttany.html
http://example.com/http.html
http://example.com/tt.html
http://example.com/somett.html
http://example.com/somettsome.html
/example.com/somettsome.html

capturing only http://

grep -oP '.*?\Khttp://\S*tt(?!p:)\S*' file

http://example.com/redirect?http://some/anyttany.html
http://example.com/http.html
http://example.com/tt.html
http://example.com/somett.html
http://example.com/somettsome.html

Upvotes: 1

Jotne
Jotne

Reputation: 41460

You can use awk like this

cat file:
http://example.com
http://google.com
my.tt.com
t.foo.bar
http://foobar.com
http://example.com/somett.html
http://example.com/http.html
http://example.com/notete.html
http://example.com/tt.html
http://example.com/somett.html
http://example.com/somettsome.html

awk -F"http:" '$NF~/tt/'
my.tt.com
http://example.com/somett.html
http://example.com/http.html
http://example.com/tt.html
http://example.com/somett.html
http://example.com/somettsome.html

Upvotes: 0

Related Questions