AvyWam
AvyWam

Reputation: 1002

How to flat specific columns of a Dataframe(pandas)?

I have got a very large dataset of games records from my database. Let's say one dataframe is a part of it, and represent a single game like:

+----------+------+------+-------+----------+------------+-----+----------------+
| _id_game | age  | rank | grade |   time   |   date     | ... |   _id_player   |
+----------+------+------+-------+----------+------------+-----+----------------+
| key2589  | 14.0 |  1.0 | B     | 00:02:34 | 2015/08/02 | ... | maximi-125     |
| key2589  | 28.0 |  2.0 | A     | 00:02:50 | 2015/08/02 | ... | scooby-897     |
| key2589  | 16.0 |  3.0 | B     | 00:03:21 | 2015/08/02 | ... | zorro-003      |
| key2589  | 30.0 |  4.0 | D     | 00:03:45 | 2015/08/02 | ... | barabapapa-007 |
+----------+------+------+-------+----------+------------+-----+----------------+

Each row of this dataframe above represent one player of this game. I want to make appear some characteristics of the all other players of the game for each row (so each player compared to his competitors).

I would like to modify the dataframe like:

+----------+------+------+-------+----------+------------+--------+--------+--------+----------+-----+----------------+
| _id_game | age  | rank | grade |   time   |    date    | p1_age | p2_age | p3_age | p1_grade | ... |   _id_player   |
+----------+------+------+-------+----------+------------+--------+--------+--------+----------+-----+----------------+
| key2589  | 14.0 |  1.0 | B     | 00:02:34 | 2015/08/02 |   28.0 |   16.0 |   30.0 | A        | ... | maximi-125     |
| key2589  | 28.0 |  2.0 | A     | 00:02:50 | 2015/08/02 |   14.0 |   16.0 |   30.0 | B        | ... | scooby-897     |
| key2589  | 16.0 |  3.0 | B     | 00:03:21 | 2015/08/02 |   14.0 |   28.0 |   30.0 | B        | ... | zorro-003      |
| key2589  | 30.0 |  4.0 | D     | 00:03:45 | 2015/08/02 |   14.0 |   28.0 |   16.0 | B        | ... | barabapapa-007 |
+----------+------+------+-------+----------+------------+--------+--------+--------+----------+-----+----------------+

As you can see, I won't make duplication of the fields date or _id_game. Just the specific fields what are players characteristics.

The main idea is to make predictions of one field considering the presence of the other competitors for each individual represented by a row.

I don't know how to do it in a pandas dataframe considering one df is only one game. That is more complicated considering the dataframe contains a bunch of games.

Anyone to help me?

Upvotes: 2

Views: 193

Answers (1)

BENY
BENY

Reputation: 323396

This is a cross join problem , just after join we need filter to exclude duplicate per row , then re format the output

s=df[['_id_game','rank']].merge(df[['_id_game','age','rank','grade']],on='_id_game')#  merge here
s=s[s.rank_x!=s.rank_y]# filter the one already have 
s=s.assign(key=s.groupby(['_id_game','rank_x']).cumcount()+1)# ge tthe key for pivot 
s=s.set_index(['_id_game','rank_x','key'])[['age','grade']].unstack() # reformat 
s.columns=s.columns.map('P{0[1]}_{0[0]}'.format)# flatten the columns 
s
Out[850]: 
                 P1_age  P2_age  P3_age P1_grade P2_grade P3_grade
_id_game rank_x                                                   
key2589  1.0       28.0    16.0    30.0        A        B        D
         2.0       14.0    16.0    30.0        B        B        D
         3.0       14.0    28.0    30.0        B        A        D
         4.0       14.0    28.0    16.0        B        A        B

All you need here is merge back to original dataframe check link

Upvotes: 2

Related Questions