Reputation: 7490
I have a pandas column like this:
LOD-NY-EP-ADM
LOD-NY-EC-RUL
LOD-NY-EC-WFL
LOD-NY-LSM-SER
LOD-NY-PM-MOB
LOD-NY-PM-MOB
LOD-NY-RMK
LOD-NY-EC-TIM
I want the output in new column as
EP
EC
EC
LSM
PM
PM
RMK
EC
I tried this:
pattern=df.column[0:10].str.extract(r"\w*-NY-(.*?)-\w*",expand=False)
While it works for everything but it fails to get RMK out and gives NaN since there is nothing after that and it looks for -\w zero or more times. But then that should work if there is nothing after RMK.
Any idea whats going wrong?
We can just use a array of these and use regular expression if pandas syntax is not familiar.
Upvotes: 0
Views: 194
Reputation: 3300
pattern=df.column[0:10].str.extract(r"\w*-NY-(\w+)",expand=False)
See https://regex101.com/r/3uDpam/3
Your regex meant matching strings must have 3 -
characters. I changed it so last -XX
could occur 0 or 1 times.
UPDATE: Changed so 2nd group is non-capturing (added ?:
)
UPDATE: Thanks to Casimir, removed useless group at end of pattern
Upvotes: 1
Reputation: 13498
Could you just use regular python? Let df be your dataframe, and row be the name of your row.
series = df.row
new_list = [i.split('-')[2] for i in series]
new_series = pd.Series(new_list)
Upvotes: 1