Reputation: 199

How to split string into two strings base on delimiter in a dataframe

I have a dateframe that contains a list of file names, it looks like this below

fname
ill_2_uctry.pdf
ell_23_uctry.pdf
fgy_4_uctry.pdf
:
:
:
hilll_234_uctry.pdf

I want to split the strings from the fname column into a new name, which should look like this below

fname                       name
ill_2_uctry.pdf            ill_2
ell_23_uctry.pdf           ell_23
fgy_4_uctry.pdf            fgy_4
:                            :
:                            :
:                            :
hilll_234_uctry.pdf        hilll_234

I tried to use split('_') but it will return an output that only contains the first part of the string, which looks like this ill instead of the output that I want above. I am wondering am I using the correct method or I should consider using other methods.

Thanks all!

Upvotes: 0

Answers (4)

Unsung

Reputation: 1

df["name"] = df.fname.str.extract("(^[^_]*_[^_]*)")

Upvotes: 0

Guy

Reputation: 50819

You can use rsplit with n=1 and expand=True to split the last occurrence of _

n: Limit number of splits in output

expand: Expand the split strings into separate columns.

df['name'] = df['fname'].str.rsplit('_', 1, expand=True)[0]

Upvotes: 2

Tamil Selvan

Reputation: 1749

try this, this will give you an output that you want

d['name'] = d['fname'].apply(lambda x:'_'.join(x.split('_')[:2]))

This should work !!

Upvotes: 0

Tim Biegeleisen

Reputation: 521279

Using str.extract:

df["name"] = df["fname"].str.extract(r'^([^_]+_[^_]+)')

Here is demo showing that the regex logic is working correctly.

Upvotes: 2

How to split string into two strings base on delimiter in a dataframe

Answers (4)

Related Questions