Reputation: 51
I have a Name Column in Dataframe in which there are Multiple names.
DataFrame
import pandas as pd
df = pd.DataFrame({'name': ['Brailey, Mr. William Theodore Ronald', 'Roger Marie Bricoux',
"Mr. Roderick Robert Crispin",
"Cunningham"," Mr. Alfred Fleming"]})`
OUTPUT
Name
0 Brailey, Mr. William Theodore Ronald
1 Roger Marie Bricoux
2 Mr. Roderick Robert Crispin
3 Cunningham
4 Mr. Alfred Fleming
I wrote a row classification function, like if I pass a row/name it should return output class
mus = ['Brailey, Mr. William Theodore Ronald', 'Roger Marie Bricoux', 'John Frederick Preston Clarke']
def classify_role(row):
if row.loc['name'] in mus:
return 'musician'
Calling a function
is_brailey = df['name'].str.startswith('Brailey')
print(classify_role(df[is_brailey].iloc[0]))
Should show 'musician'
But output is showing different class I think I am writing something wrong here in classify_role()
Must be this row
if row.loc['name'] in mus:
Summary:
I am in need of a solution if I put first name of a person in startswith()
who is in musi
it should return musician
Upvotes: 2
Views: 137
Reputation: 863226
EDIT: If want test if values exist in lists you can create dictionary and test membership by Series.isin
:
mus = ['Brailey, Mr. William Theodore Ronald', 'Roger Marie Bricoux',
'John Frederick Preston Clarke']
cat1 = ['Mr. Alfred Fleming','Cunningham']
d = {'musician':mus, 'category':cat1}
for k, v in d.items():
df.loc[df['Name'].isin(v), 'type'] = k
print (df)
Name type
0 Brailey, Mr. William Theodore Ronald musician
1 Roger Marie Bricoux musician
2 Mr. Roderick Robert Crispin NaN
3 Cunningham category
4 Mr. Alfred Fleming category
Your solution should be changed:
mus = ['Brailey, Mr. William Theodore Ronald', 'Roger Marie Bricoux',
'John Frederick Preston Clarke']
def classify_role(row):
if row in mus:
return 'musician'
df['type'] = df['Name'].apply(classify_role)
print (df)
Name type
0 Brailey, Mr. William Theodore Ronald musician
1 Roger Marie Bricoux musician
2 Mr. Roderick Robert Crispin None
3 Cunningham None
4 Mr. Alfred Fleming None
You can pass values in tuple to Series.str.startswith
, solution should be expand to match more categories by dictionary:
d = {'musician': ['Brailey, Mr. William Theodore Ronald'],
'cat1':['Roger Marie Bricoux', 'Cunningham']}
for k, v in d.items():
df.loc[df['Name'].str.startswith(tuple(v)), 'type'] = k
print (df)
Name type
0 Brailey, Mr. William Theodore Ronald musician
1 Roger Marie Bricoux cat1
2 Mr. Roderick Robert Crispin NaN
3 Cunningham cat1
4 Mr. Alfred Fleming NaN
Upvotes: 1