Reputation: 199
I have a dateframe that contains a list of file names, it looks like this below
fname
ill_2_uctry.pdf
ell_23_uctry.pdf
fgy_4_uctry.pdf
:
:
:
hilll_234_uctry.pdf
I want to split the strings from the fname column into a new name, which should look like this below
fname name
ill_2_uctry.pdf ill_2
ell_23_uctry.pdf ell_23
fgy_4_uctry.pdf fgy_4
: :
: :
: :
hilll_234_uctry.pdf hilll_234
I tried to use split('_')
but it will return an output that only contains the first part of the string, which looks like this ill
instead of the output that I want above. I am wondering am I using the correct method or I should consider using other methods.
Thanks all!
Upvotes: 0
Views: 270
Reputation: 50819
You can use rsplit with n=1
and expand=True
to split the last occurrence of _
n
: Limit number of splits in output
expand
: Expand the split strings into separate columns.
df['name'] = df['fname'].str.rsplit('_', 1, expand=True)[0]
Upvotes: 2
Reputation: 1749
try this, this will give you an output that you want
d['name'] = d['fname'].apply(lambda x:'_'.join(x.split('_')[:2]))
This should work !!
Upvotes: 0
Reputation: 521279
Using str.extract
:
df["name"] = df["fname"].str.extract(r'^([^_]+_[^_]+)')
Here is demo showing that the regex logic is working correctly.
Upvotes: 2