Reputation: 1722
I have a dataset;
>>> all_transcripts
ID Type Name
1 Guest Hugo
1 Guest Hugo
1 Boss Boss
1 Boss Boss
2 Boss Boss
2 Guest Calvin
2 Guest Calvin
3 Guest Klein
3 Boss Boss
Now, I want to create a column called nameGuest
that contains the name of the guest per ID on every row. Thus, my desired output looks as follows:
>>> all_transcripts
ID Type Name nameGuest
1 Guest Hugo Hugo
1 Guest Hugo Hugo
1 Boss Boss Hugo
1 Boss Boss Hugo
2 Boss Boss Calvin
2 Guest Calvin Calvin
2 Guest Calvin Calvin
3 Guest Klein Klein
3 Boss Boss Klein
How can I do this?
Upvotes: 1
Views: 37
Reputation: 42886
Groupby.first
You can use groupby
and before that filter on Type=Guest
and select the first
name while aggregating.
This will get us the names with corresponding ID
. So we can map that back to our dataframe and create the new column:
names = df[df['Type'] == 'Guest'].groupby('ID')['Name'].first()
df['nameGuest'] = df['ID'].map(names)
print(df)
ID Type Name nameGuest
0 1 Guest Hugo Hugo
1 1 Guest Hugo Hugo
2 1 Boss Boss Hugo
3 1 Boss Boss Hugo
4 2 Boss Boss Calvin
5 2 Guest Calvin Calvin
6 2 Guest Calvin Calvin
7 3 Guest Klein Klein
8 3 Boss Boss Klein
Output of names
print(names)
ID
1 Hugo
2 Calvin
3 Klein
Name: Name, dtype: object
Upvotes: 1
Reputation: 862511
Use Series.map
by helper Series
created by boolean indexing
, DataFrame.drop_duplicates
and DataFrame.set_index
for get first value of Guest
per group:
s = df[df['Type'] == 'Guest'].drop_duplicates('ID').set_index('ID')['Name']
df['nameGuest'] = df['ID'].map(s)
print (df)
ID Type Name nameGuest
0 1 Guest Hugo Hugo
1 1 Guest Hugo Hugo
2 1 Boss Boss Hugo
3 1 Boss Boss Hugo
4 2 Boss Boss Calvin
5 2 Guest Calvin Calvin
6 2 Guest Calvin Calvin
7 3 Guest Klein Klein
8 3 Boss Boss Klein
Upvotes: 2