Ahmed Khalied
Ahmed Khalied

Reputation: 19

I need to extract a specific word after a word using regex

I have this series:

[Spanish | Intermediate; Portuguese | Native; English | Advanced,
French | Intermediate; Spanish | Native; English | Native,
Spanish | Native; English | Intermediate,
Portuguese | Native; English | Intermediate; Spanish | Intermediate ]

I want to use regex to extract the Spanish followed by the level like; Spanish | Native.

I used:

y =[]
for i in la:
    x = re.findall(r"[Spanish+[^a-z]+[^a-z]+[^a-z]+"
                   r"Intermediate|Advanced|Native|Beginner]", i)
    y.append(x)

but not good result.

Upvotes: 1

Views: 140

Answers (2)

Ujjwal Kumar Maharana
Ujjwal Kumar Maharana

Reputation: 269

y =[]
for i in la:
    x = re.findall(r"Spanish.*?\|(.*?);", i)
    y.append(x)

This would extract & return all the levels associated with Spanish language. The pattern is Spanish then .*? means any characters in non greedy way then followed by | symbol then again .*? means any characters in non greedy way followed by semicolon The bracket around .*? means return only the levels

Upvotes: 0

Trenton Telge
Trenton Telge

Reputation: 488

To get the groups of Language | Level, you can use \w+\s\|\s\w+. This looks for a word, then whitespace, then a pipe, then whitespace, then a word.

Upvotes: 1

Related Questions