Luke Parker
Luke Parker

Reputation: 1

Slice strings in a series depending on position of character

So in my dataframe I want to slice my strings in order to remove a pre-amble from data. The only trouble is that this data is of varying lengths. So I need to work out where it supposed to start.

Before:

Day 1 - abc
Day 2 - bcd
DAY 10 - DFE

After:

abc
bcd
DFE

I understand why the following doesn't work but thought I would provide it as a starting point

df['String'] = df.String.str.slice(start=df.String.str.find('-')+1)

Upvotes: 0

Views: 297

Answers (2)

dstrants
dstrants

Reputation: 7705

I think you can use .split instead of .slice in order not to worry about the index of the -. So something like this is more suitable in my opinion.

df['String'] = df.String.str.split(' - ').apply(lambda x: x[-1])

Note This method also removes the whitespace around the -. If you need the whitespace after the dash on your resulting string you can just remove the whitespace, not the operator like:

df['String'] = df.String.str.split('-').apply(lambda x: x[-1])

Update

After @satilog mentioned in their answer you need to use a lambda to take the last cell from .split(). I fixed the code here.

Upvotes: 0

satilog
satilog

Reputation: 310

You can use .split on each row and split by a " " and then apply a lambda function to retrieve the last element of the list in each row.

Code:

import pandas as pd

df = pd.DataFrame(data=["Day 1 - abc", "Day 2 - bcd", "DAY 10 - DFE"], columns=["String"])
df["String"] = df.String.str.split(" ").apply(lambda x: x[-1])

Output:

  String
0    abc
1    bcd
2    DFE

Upvotes: 1

Related Questions