Reputation: 1002
I have got a very large dataset of games records from my database. Let's say one dataframe is a part of it, and represent a single game like:
+----------+------+------+-------+----------+------------+-----+----------------+
| _id_game | age | rank | grade | time | date | ... | _id_player |
+----------+------+------+-------+----------+------------+-----+----------------+
| key2589 | 14.0 | 1.0 | B | 00:02:34 | 2015/08/02 | ... | maximi-125 |
| key2589 | 28.0 | 2.0 | A | 00:02:50 | 2015/08/02 | ... | scooby-897 |
| key2589 | 16.0 | 3.0 | B | 00:03:21 | 2015/08/02 | ... | zorro-003 |
| key2589 | 30.0 | 4.0 | D | 00:03:45 | 2015/08/02 | ... | barabapapa-007 |
+----------+------+------+-------+----------+------------+-----+----------------+
Each row of this dataframe above represent one player of this game. I want to make appear some characteristics of the all other players of the game for each row (so each player compared to his competitors).
I would like to modify the dataframe like:
+----------+------+------+-------+----------+------------+--------+--------+--------+----------+-----+----------------+
| _id_game | age | rank | grade | time | date | p1_age | p2_age | p3_age | p1_grade | ... | _id_player |
+----------+------+------+-------+----------+------------+--------+--------+--------+----------+-----+----------------+
| key2589 | 14.0 | 1.0 | B | 00:02:34 | 2015/08/02 | 28.0 | 16.0 | 30.0 | A | ... | maximi-125 |
| key2589 | 28.0 | 2.0 | A | 00:02:50 | 2015/08/02 | 14.0 | 16.0 | 30.0 | B | ... | scooby-897 |
| key2589 | 16.0 | 3.0 | B | 00:03:21 | 2015/08/02 | 14.0 | 28.0 | 30.0 | B | ... | zorro-003 |
| key2589 | 30.0 | 4.0 | D | 00:03:45 | 2015/08/02 | 14.0 | 28.0 | 16.0 | B | ... | barabapapa-007 |
+----------+------+------+-------+----------+------------+--------+--------+--------+----------+-----+----------------+
As you can see, I won't make duplication of the fields date or _id_game. Just the specific fields what are players characteristics.
The main idea is to make predictions of one field considering the presence of the other competitors for each individual represented by a row.
I don't know how to do it in a pandas dataframe considering one df is only one game. That is more complicated considering the dataframe contains a bunch of games.
Anyone to help me?
Upvotes: 2
Views: 193
Reputation: 323396
This is a cross join problem , just after join we need filter to exclude duplicate per row , then re format the output
s=df[['_id_game','rank']].merge(df[['_id_game','age','rank','grade']],on='_id_game')# merge here
s=s[s.rank_x!=s.rank_y]# filter the one already have
s=s.assign(key=s.groupby(['_id_game','rank_x']).cumcount()+1)# ge tthe key for pivot
s=s.set_index(['_id_game','rank_x','key'])[['age','grade']].unstack() # reformat
s.columns=s.columns.map('P{0[1]}_{0[0]}'.format)# flatten the columns
s
Out[850]:
P1_age P2_age P3_age P1_grade P2_grade P3_grade
_id_game rank_x
key2589 1.0 28.0 16.0 30.0 A B D
2.0 14.0 16.0 30.0 B B D
3.0 14.0 28.0 30.0 B A D
4.0 14.0 28.0 16.0 B A B
All you need here is merge
back to original dataframe check link
Upvotes: 2