Exclude a given string using Regex in python

Question

I want to exclude everything after if 'un' is encountered in any given string. Below is my code, it only outputs French! Le@ Mans - Quevilly Ro instead of French! Le@ Mans - Quevilly Rouen any help on how to work that out will be appreciated.

import requests, bs4, re

get_reg = re.compile(r'''
    ([\w+\W+]*(\s\w+)*\s-\s+\w+[^'un']*)  #teams
    (\s\w+\s) #tip
    (@\d+.\d+)
    ''', re.VERBOSE)
print(get_reg.findall("French! Le@ Mans - Quevilly Rouen un3.5 @1.23"))

SpoonMeiser · Accepted Answer

[] matches any of the characters inside it, so [\w+\W+] matches any character that is either a word character, not a word character, or +, which is clearly not what you intended. Similarly, [^'un'] matches any character that is not one of ', u or n.

A solution to the question, as you've asked it, is:

re.findall(r'(.*?)\s*un', "French! Le@ Mans - Quevilly Rouen un3.5 @1.23")

Demo: https://regexr.com/40806

This non-greedily matches anything followed by whitespace and un, and gives you that first part back as a group.

However, it looks from your code that you are trying to match other parts of the string too, and from the discussion in comments, I think what you want is:

get_reg = re.compile(r'(.*?)\s*(un\d+.\d+)\s*(@\d+.\d+)')
print(get_reg.findall("French! Le@ Mans - Quevilly Rouen un3.5 @1.23"))

Demo: https://regexr.com/4085t

I've excluded the spaces between the parts from the matching groups, although that's different from your example, because I suspect that this would be more useful to you.

Exclude a given string using Regex in python

Answers (1)

Related Questions