Reputation: 15
Looking for some help. I have a pandas dataframe column and I want to extract the prefix where such prefix exists in a separate list.
pr_list = ['1 FO-','2 IA-']
Column in df is like
PartNumber
ABC
DEF
1 FO-BLABLA
2 IA-EXAMPLE
What I am looking for is to extract the prefix where present, put in a new column and leave the rest of the string in the original column.
PartNumber Prefix
ABC
DEF
BLABLA 1 FO-
EXAMPLE 2 IA-
Have tried some things like str.startswith but a bit of a python novice and wasn't able to get it to work.
much appreciated
EDIT
Both solutions below work on the test data, however I am getting an error
error: nothing to repeat at position 16
Which suggests something askew in my dataset. Not sure what position 16 refers to but looking at both the prefix list and PartNumber column in position 16 nothing seems out of the ordinary?
EDIT 2 I have traced it to have an * in the pr_list seems to be throwing it. is * some reserved character? is there a way to break it out so it is read as text?
Upvotes: 1
Views: 615
Reputation: 1493
Maybe it's not what you are looking for, but may it help.
import pandas as pd
pr_list = ['1 FO-','2 IA-']
df = pd.DataFrame({'PartNumber':['ABC','DEF','1 FO-BLABLA','2 IA-EXAMPLE']})
extr = '|'.join(x for x in pr_list)
df['Prefix'] = df['PartNumber'].str.extract('('+ extr + ')', expand=False).fillna('')
df['PartNumber'] = df['PartNumber'].str.replace('|'.join(pr_list),'')
df
Upvotes: 0
Reputation: 75120
You can try:
df['Prefix']=df.PartNumber.str.extract(r'({})'.format('|'.join(pr_list))).fillna('')
df.PartNumber=df.PartNumber.str.replace('|'.join(pr_list),'')
print(df)
PartNumber Prefix
0 ABC
1 DEF
2 BLABLA 1 FO-
3 EXAMPLE 2 IA-
Upvotes: 1