Reputation: 5666
I have a Pandas dataframe that looks as follows.
player count1 count2 text
A 1 1 X
A 2 1 Y
A 3 1 Z
A 4 2 D
A 5 2 E
B 1 1 F
B 2 2 G
B 3 2 H
B 4 2 J
Column player
contains names, count1
is a cumulative sum, column count2
contains other counts, and column text
contains some text.
I now want to create 2 new columns that contain the values of count1
and text
where the column count2
first contains the value 2
.
Hence, the result should look like this:
player count1 count2 text new new2
A 1 1 X 4 D
A 2 1 Y 4 D
A 3 1 Z 4 D
A 4 2 D 4 D
A 5 2 E 4 D
B 1 1 F 2 G
B 2 2 G 2 G
B 3 2 H 2 G
B 4 2 J 2 G
I already asked a similar question, but where only, one new column should be added [here][1].
The answer was to use map
by Series
.
s = df[df['count2'] == 2].drop_duplicates(['player']).set_index('player')['count1']
df['new'] = df['player'].map(s)
However, when I try to apply this approach to two columns, it does not work.
I try it like this:
s = df[df['count2'] == 2].drop_duplicates(['player']).set_index('player')[['count1', 'text']]
df[['new', 'new2']] = df['player'].map(s)
This yields the following error:
TypeError: 'DataFrame' object is not callable
How can I get this to work?
Upvotes: 3
Views: 434
Reputation: 142106
You can filter on count2 == 2, drop duplicates by player, then merge the result back to your original DF on player, eg:
new = df.merge(
df.loc[df.count2 == 2, ['player', 'count1', 'text']]
.drop_duplicates(subset=['player']),
on='player'
)
Which gives you:
player count1_x count2 text_x count1_y text_y
0 A 1 1 X 4 D
1 A 2 1 Y 4 D
2 A 3 1 Z 4 D
3 A 4 2 D 4 D
4 A 5 2 E 4 D
5 B 1 1 F 2 G
6 B 2 2 G 2 G
7 B 3 2 H 2 G
8 B 4 2 J 2 G
Upvotes: 2