mrCarnivore
mrCarnivore

Reputation: 5078

Match everything except a specific string

I have looked at lots of posts with similar title but I have found nothing that works with python or even this site: https://regex101.com

How can I match everything but a specific text?

My text:

1234_This is a text Word AB

Protocol  Address          ping
Internet  1.1.1.1            - 
Internet  1.1.1.2            25 
Internet  1.1.1.3            8 
Internet  1.1.1.4            - 

1234_This is a text Word BCD    
Protocol  Address          ping
Internet  2.2.2.1            10 
Internet  2.2.2.2            - 

I want to match Word \w+ and then the rest until the next 1234. So the result should be (return groups marked in ()):

(1234_This is a text (Word AB))(

Protocol  Address          ping
Internet  1.1.1.1            - 
Internet  1.1.1.2            25 
Internet  1.1.1.3            8 
Internet  1.1.1.4            - 

)(1234_This is a text (Word BCD)(    
Protocol  Address          ping
Internet  2.2.2.1            10 
Internet  2.2.2.2            - )

The first part is easy as: matches = re.findall(r'1234_This is a text (Word \w+)', var) But the next part I am unable to achieve. I have tried negative lookahead: ^(?!1234) but then it matches nothing any more...

Upvotes: 2

Views: 1160

Answers (2)

Aaditya Ura
Aaditya Ura

Reputation: 12679

As you stated out:

I want to match Word \w+ and then the rest until the next 1234.

Do you want something like this ?

import re
pattern=r'((1234_This is a text) (Word\s\w+))((\n?.*(?!\n\n))*)'
string="""1234_This is a text Word AB

Protocol  Address          ping
Internet  1.1.1.1            -
Internet  1.1.1.2            25
Internet  1.1.1.3            8
Internet  1.1.1.4            -

1234_This is a text Word BCD
Protocol  Address          ping
Internet  2.2.2.1            10
Internet  2.2.2.2            -"""

match=re.finditer(pattern,string,re.M)
for find in match:
    print("this is group_1 {}".format(find.group(1)))
    print("this is group_3 {}".format(find.group(3)))




    print("this is group_4 {}".format(find.group(4)))

output:

this is group_1 1234_This is a text Word AB
this is group_3 Word AB
this is group_4 

Protocol  Address          ping
Internet  1.1.1.1            -
Internet  1.1.1.2            25
Internet  1.1.1.3            8
Internet  1.1.1.4            
this is group_1 1234_This is a text Word BCD
this is group_3 Word BCD
this is group_4 
Protocol  Address          ping
Internet  2.2.2.1            10
Internet  2.2.2.2            -

Upvotes: 1

ctwheels
ctwheels

Reputation: 22837

Code

See regex in use here

(1234[\w ]+(Word \w+))((?:(?!1234)[\s\S])*)

Using the s modifier you can use the following.
See regex in use here

(1234[\w ]+(Word \w+))((?:(?!1234).)*)

Explanation

  • (1234[\w ]+(Word \w+)) Capture the following into capture group 1
    • 1234 Match this literally
    • [\w ]+ Match one or more word characters or spaces
    • (Word \w+) Capture the following into capture group 2
      • Word Match this literally (note the trailing space)
      • \w+ Match any word character one or more times
  • ((?:(?!1234)[\s\S])*) Capture the following into capture group 2
    • (?:(?!1234)[\s\S])* Match the following any number of times (tempered greedy token)
      • (?!1234) Negative lookahead ensuring what follows doesn't match
      • [\s\S])* Match any character any number of times

Upvotes: 3

Related Questions