Swati
Swati

Reputation: 51

startswith() function help needed in Pandas Dataframe

I have a Name Column in Dataframe in which there are Multiple names.

DataFrame

import pandas as pd
df = pd.DataFrame({'name': ['Brailey, Mr. William Theodore Ronald', 'Roger Marie Bricoux',
                            "Mr. Roderick Robert Crispin",
                            "Cunningham"," Mr. Alfred Fleming"]})`

OUTPUT

   Name
0  Brailey, Mr. William Theodore Ronald
1                   Roger Marie Bricoux
2           Mr. Roderick Robert Crispin
3                            Cunningham
4                    Mr. Alfred Fleming

I wrote a row classification function, like if I pass a row/name it should return output class

mus = ['Brailey, Mr. William Theodore Ronald', 'Roger Marie Bricoux', 'John Frederick Preston Clarke']
def classify_role(row):
    if row.loc['name'] in mus:
        return 'musician'

Calling a function

is_brailey = df['name'].str.startswith('Brailey')
print(classify_role(df[is_brailey].iloc[0])) 

Should show 'musician' But output is showing different class I think I am writing something wrong here in classify_role() Must be this row if row.loc['name'] in mus:

Summary: I am in need of a solution if I put first name of a person in startswith() who is in musi it should return musician

Upvotes: 2

Views: 137

Answers (1)

jezrael
jezrael

Reputation: 863226

EDIT: If want test if values exist in lists you can create dictionary and test membership by Series.isin:

mus = ['Brailey, Mr. William Theodore Ronald', 'Roger Marie Bricoux',
       'John Frederick Preston Clarke']
cat1 = ['Mr. Alfred Fleming','Cunningham']

d = {'musician':mus, 'category':cat1}

for k, v in d.items():
    df.loc[df['Name'].isin(v), 'type'] = k
print (df)
                                   Name      type
0  Brailey, Mr. William Theodore Ronald  musician
1                   Roger Marie Bricoux  musician
2           Mr. Roderick Robert Crispin       NaN
3                            Cunningham  category
4                    Mr. Alfred Fleming  category

Your solution should be changed:

mus = ['Brailey, Mr. William Theodore Ronald', 'Roger Marie Bricoux',
          'John Frederick Preston Clarke']
def classify_role(row):
     if row in mus:
        return 'musician'

df['type'] = df['Name'].apply(classify_role)
print (df)
                                   Name      type
0  Brailey, Mr. William Theodore Ronald  musician
1                   Roger Marie Bricoux  musician
2           Mr. Roderick Robert Crispin      None
3                            Cunningham      None
4                    Mr. Alfred Fleming      None

You can pass values in tuple to Series.str.startswith, solution should be expand to match more categories by dictionary:

d = {'musician': ['Brailey, Mr. William Theodore Ronald'],
     'cat1':['Roger Marie Bricoux', 'Cunningham']}

for k, v in d.items():
    df.loc[df['Name'].str.startswith(tuple(v)), 'type'] = k
print (df)
                                   Name      type
0  Brailey, Mr. William Theodore Ronald  musician
1                   Roger Marie Bricoux      cat1
2           Mr. Roderick Robert Crispin       NaN
3                            Cunningham      cat1
4                    Mr. Alfred Fleming       NaN

Upvotes: 1

Related Questions