Baktaawar
Baktaawar

Reputation: 7490

Extracting substring in python string using regular expression

I have a pandas column like this:

LOD-NY-EP-ADM
LOD-NY-EC-RUL
LOD-NY-EC-WFL
LOD-NY-LSM-SER
LOD-NY-PM-MOB
LOD-NY-PM-MOB
LOD-NY-RMK
LOD-NY-EC-TIM

I want the output in new column as

EP
EC
EC
LSM
PM
PM
RMK
EC

I tried this:

pattern=df.column[0:10].str.extract(r"\w*-NY-(.*?)-\w*",expand=False)

While it works for everything but it fails to get RMK out and gives NaN since there is nothing after that and it looks for -\w zero or more times. But then that should work if there is nothing after RMK.

Any idea whats going wrong?

We can just use a array of these and use regular expression if pandas syntax is not familiar.

Upvotes: 0

Views: 194

Answers (2)

CodeBoy
CodeBoy

Reputation: 3300

pattern=df.column[0:10].str.extract(r"\w*-NY-(\w+)",expand=False)

See https://regex101.com/r/3uDpam/3

Your regex meant matching strings must have 3 - characters. I changed it so last -XX could occur 0 or 1 times.

UPDATE: Changed so 2nd group is non-capturing (added ?:)

UPDATE: Thanks to Casimir, removed useless group at end of pattern

Upvotes: 1

Primusa
Primusa

Reputation: 13498

Could you just use regular python? Let df be your dataframe, and row be the name of your row.

series = df.row
new_list =  [i.split('-')[2] for i in series]
new_series = pd.Series(new_list)

Upvotes: 1

Related Questions