Reputation: 1
So in my dataframe I want to slice my strings in order to remove a pre-amble from data. The only trouble is that this data is of varying lengths. So I need to work out where it supposed to start.
Before:
Day 1 - abc
Day 2 - bcd
DAY 10 - DFE
After:
abc
bcd
DFE
I understand why the following doesn't work but thought I would provide it as a starting point
df['String'] = df.String.str.slice(start=df.String.str.find('-')+1)
Upvotes: 0
Views: 297
Reputation: 7705
I think you can use .split
instead of .slice
in order not to worry about the index of the -
. So something like this is more suitable in my opinion.
df['String'] = df.String.str.split(' - ').apply(lambda x: x[-1])
Note
This method also removes the whitespace around the -
. If you need the whitespace after the dash on your resulting string you can just remove the whitespace, not the operator like:
df['String'] = df.String.str.split('-').apply(lambda x: x[-1])
After @satilog mentioned in their answer you need to use a lambda to take the last cell from .split()
. I fixed the code here.
Upvotes: 0
Reputation: 310
You can use .split
on each row and split by a " "
and then apply a lambda function to retrieve the last element of the list in each row.
Code:
import pandas as pd
df = pd.DataFrame(data=["Day 1 - abc", "Day 2 - bcd", "DAY 10 - DFE"], columns=["String"])
df["String"] = df.String.str.split(" ").apply(lambda x: x[-1])
Output:
String
0 abc
1 bcd
2 DFE
Upvotes: 1