Reputation: 21
I have a panda dataframe with a column name - AA_IDs. The column name values has a special character "-#" in few rows. I need to determine three things:
E.g. AFB001 9183Daily-#789876A
Answer would be before the delimiter - AFB001 9183Daily
and after the delimiter - 789876A
Upvotes: 0
Views: 1424
Reputation: 19322
Just use apply function with split -
df['AA_IDs'].apply(lambda x: x.split('-#'))
This should give you a series with a list for each row as [AFB001 9183Daily, 789876A]
This would be significantly faster than using regex, and not to mention the readability.
Upvotes: 2
Reputation: 63
So lets say the dataframe is called df
and the column with the text is A
.
You can use
import re # Import regex
pattern = r'<your regex>'
df['one'] = df.A.str.extract(pattern)
This creates a new column containing the extracted text. You just need to create a regex to extract what you want from your string(s). I highly recommend regex101 to help you construct your regex.
Hope this helps!
Upvotes: 0