Philip Hutchinson
Philip Hutchinson

Reputation: 15

Extract prefix from string in dataframe column where exists in a list

Looking for some help. I have a pandas dataframe column and I want to extract the prefix where such prefix exists in a separate list.

pr_list = ['1 FO-','2 IA-']

Column in df is like

PartNumber     
ABC
DEF
1 FO-BLABLA
2 IA-EXAMPLE

What I am looking for is to extract the prefix where present, put in a new column and leave the rest of the string in the original column.

PartNumber   Prefix
ABC          
DEF
BLABLA       1 FO-
EXAMPLE      2 IA-

Have tried some things like str.startswith but a bit of a python novice and wasn't able to get it to work.

much appreciated

EDIT Both solutions below work on the test data, however I am getting an error
error: nothing to repeat at position 16
Which suggests something askew in my dataset. Not sure what position 16 refers to but looking at both the prefix list and PartNumber column in position 16 nothing seems out of the ordinary?

EDIT 2 I have traced it to have an * in the pr_list seems to be throwing it. is * some reserved character? is there a way to break it out so it is read as text?

Upvotes: 1

Views: 615

Answers (2)

YusufUMS
YusufUMS

Reputation: 1493

Maybe it's not what you are looking for, but may it help.

import pandas as pd

pr_list = ['1 FO-','2 IA-']
df = pd.DataFrame({'PartNumber':['ABC','DEF','1 FO-BLABLA','2 IA-EXAMPLE']})

extr = '|'.join(x for x in pr_list)
df['Prefix'] = df['PartNumber'].str.extract('('+ extr + ')', expand=False).fillna('')
df['PartNumber'] = df['PartNumber'].str.replace('|'.join(pr_list),'')
df

Upvotes: 0

anky
anky

Reputation: 75120

You can try:

df['Prefix']=df.PartNumber.str.extract(r'({})'.format('|'.join(pr_list))).fillna('')
df.PartNumber=df.PartNumber.str.replace('|'.join(pr_list),'')
print(df)

  PartNumber Prefix
0        ABC       
1        DEF       
2     BLABLA  1 FO-
3    EXAMPLE  2 IA-

Upvotes: 1

Related Questions