Extracting in words in tags in PYTHON

Question

Hello I'd like to extract the content of this tag

Deep injustice

in many sentences of text (Here).

df['text'].str.extractall(r'^<(?P\w+).*[int]?.*(?P\d?\d)>(?P[a-zA-Z]*?.*[a-zA-Z]*)<')

My code produce only few of them(tag). Why it do not extract others?

                  Sentiments Intensite               Expression
      match                                                    
405   0         Disagreement         3    Bizarre contradiction
921   0         Satisfaction         5           La plus simple
2549  0      Dissatisfaction         3     Ne me contentant pas

Wiktor Stribiżew · Accepted Answer

You may use

df['text'].str.extractall(r'<(?P\w+)\s+int=(?P\d+)>(?P[^<]*)')

See the regex demo.

Details

< - a < char
(?P\w+) - Group "Sentiments": 1 or more letters, digits, underscores
\s+ - 1+ whitespace
int= - a substring
(?P\d+) - Group "Intensite": 1+ digits
> - a > char
(?P[^<]*) - Group "Expression": 0 or more chars other than >

Extracting in words in tags in PYTHON

Answers (1)

Related Questions