Reputation: 441
I want to exclude everything after if 'un' is encountered in any given string. Below is my code, it only outputs French! Le@ Mans - Quevilly Ro instead of French! Le@ Mans - Quevilly Rouen any help on how to work that out will be appreciated.
import requests, bs4, re
get_reg = re.compile(r'''
([\w+\W+]*(\s\w+)*\s-\s+\w+[^'un']*) #teams
(\s\w+\s) #tip
(@\d+.\d+)
''', re.VERBOSE)
print(get_reg.findall("French! Le@ Mans - Quevilly Rouen un3.5 @1.23"))
Upvotes: 0
Views: 39
Reputation: 20397
[]
matches any of the characters inside it, so [\w+\W+]
matches any character that is either a word character, not a word character, or +
, which is clearly not what you intended. Similarly, [^'un']
matches any character that is not one of '
, u
or n
.
A solution to the question, as you've asked it, is:
re.findall(r'(.*?)\s*un', "French! Le@ Mans - Quevilly Rouen un3.5 @1.23")
Demo: https://regexr.com/40806
This non-greedily matches anything followed by whitespace and un
, and gives you that first part back as a group.
However, it looks from your code that you are trying to match other parts of the string too, and from the discussion in comments, I think what you want is:
get_reg = re.compile(r'(.*?)\s*(un\d+.\d+)\s*(@\d+.\d+)')
print(get_reg.findall("French! Le@ Mans - Quevilly Rouen un3.5 @1.23"))
Demo: https://regexr.com/4085t
I've excluded the spaces between the parts from the matching groups, although that's different from your example, because I suspect that this would be more useful to you.
Upvotes: 2