Reputation: 5078
I have looked at lots of posts with similar title but I have found nothing that works with python or even this site: https://regex101.com
How can I match everything but a specific text?
My text:
1234_This is a text Word AB
Protocol Address ping
Internet 1.1.1.1 -
Internet 1.1.1.2 25
Internet 1.1.1.3 8
Internet 1.1.1.4 -
1234_This is a text Word BCD
Protocol Address ping
Internet 2.2.2.1 10
Internet 2.2.2.2 -
I want to match Word \w+
and then the rest until the next 1234.
So the result should be (return groups marked in ()
):
(1234_This is a text (Word AB))(
Protocol Address ping
Internet 1.1.1.1 -
Internet 1.1.1.2 25
Internet 1.1.1.3 8
Internet 1.1.1.4 -
)(1234_This is a text (Word BCD)(
Protocol Address ping
Internet 2.2.2.1 10
Internet 2.2.2.2 - )
The first part is easy as: matches = re.findall(r'1234_This is a text (Word \w+)', var)
But the next part I am unable to achieve.
I have tried negative lookahead:
^(?!1234)
but then it matches nothing any more...
Upvotes: 2
Views: 1160
Reputation: 12679
As you stated out:
I want to match Word \w+ and then the rest until the next 1234.
Do you want something like this ?
import re
pattern=r'((1234_This is a text) (Word\s\w+))((\n?.*(?!\n\n))*)'
string="""1234_This is a text Word AB
Protocol Address ping
Internet 1.1.1.1 -
Internet 1.1.1.2 25
Internet 1.1.1.3 8
Internet 1.1.1.4 -
1234_This is a text Word BCD
Protocol Address ping
Internet 2.2.2.1 10
Internet 2.2.2.2 -"""
match=re.finditer(pattern,string,re.M)
for find in match:
print("this is group_1 {}".format(find.group(1)))
print("this is group_3 {}".format(find.group(3)))
print("this is group_4 {}".format(find.group(4)))
output:
this is group_1 1234_This is a text Word AB
this is group_3 Word AB
this is group_4
Protocol Address ping
Internet 1.1.1.1 -
Internet 1.1.1.2 25
Internet 1.1.1.3 8
Internet 1.1.1.4
this is group_1 1234_This is a text Word BCD
this is group_3 Word BCD
this is group_4
Protocol Address ping
Internet 2.2.2.1 10
Internet 2.2.2.2 -
Upvotes: 1
Reputation: 22837
(1234[\w ]+(Word \w+))((?:(?!1234)[\s\S])*)
Using the s
modifier you can use the following.
See regex in use here
(1234[\w ]+(Word \w+))((?:(?!1234).)*)
(1234[\w ]+(Word \w+))
Capture the following into capture group 1
1234
Match this literally[\w ]+
Match one or more word characters or spaces(Word \w+)
Capture the following into capture group 2
Word
Match this literally (note the trailing space)\w+
Match any word character one or more times((?:(?!1234)[\s\S])*)
Capture the following into capture group 2
(?:(?!1234)[\s\S])*
Match the following any number of times (tempered greedy token)
(?!1234)
Negative lookahead ensuring what follows doesn't match[\s\S])*
Match any character any number of timesUpvotes: 3