Substring with two possibilities regex

Question

I extracted 1 long string from a webpage. Using:

 x=re.findall(r"(?:l'article)\s\d+\w+.*;", xpath)

It extracted the following 2 strings:

 l'article 1382 du code civil ;
 l'article 700 du code de procédure civile, les condamne à payer à la société Financière du cèdre la somme globale de 3 000 euros et rejette leurs demandes ;

However, the latter one is a bit long. All I need is up to the ','. is there a way to do this directly ? So have my original regex command look for either the ';' or the ',' based on which one it encounters first.

If not, can I apply regex to a list, or do I need to write a loop for that ?

Required outcome a list with:

 l'article 1382 du code civil
 l'article 700 du code de procédure civile

Note, I have to apply this to many pages and there might be many more of these in a page. Doing anything by hand or by specifically indicating an entry in a list is not possible.

Neil · Accepted Answer

A couple things you seem to be missing the ungreedy operator, ? in order to force the regex to stop searching after it find the first occurrence. Additionally, you can check for multiple characters by using [] (refer to the following). Here would be the new code:

(?:l'article)\s\d+\w+.*?[;,]

Regex101:

https://regex101.com/r/tYkNHK/1

Substring with two possibilities regex

Answers (2)

Related Questions