Reputation: 441
I have a dataset with week end dates, user ids and an assigned score per week. I want to create a new table with the week end dates as columns and the user ids as rows. It would then match the user ID from the row with the week end date in the column and pull in the correct score. I am able to create the dataframe but can't match in the correct column, any help on this would be really appreciated
It should look something like this:
01/01/2020 07/01/2020 14/01/2020 21/01/2020
1 0.8 0.8 0.95 0.66
2 0.9 0.6 0.55 0.56
3 1.0 0.4 0.66 0.9
4 0.5 0.2 0.77 0.8
This is the data:
pd.DataFrame([{'week_end_date': {1: '2020-09-25',
662: '2020-08-14',
1719: '2020-10-16',
5780: '2020-09-11',
8589: '2020-09-18',
14569: '2020-10-09',
15334: '2020-10-30',
19864: '2020-10-02',
21641: '2020-11-06',
24117: '2020-09-25',
33090: '2020-10-02',
38253: '2020-09-04',
44078: '2020-09-11',
53185: '2020-09-18',
56337: '2020-10-09',
59212: '2020-10-23',
63782: '2020-11-13',
64287: '2020-10-16',
65208: '2020-11-13',
65483: '2020-10-23',
66264: '2020-09-25',
70286: '2020-09-11',
76640: '2020-10-16',
77288: '2020-10-09',
77371: '2020-10-30',
87259: '2020-11-06',
91926: '2020-11-06',
95279: '2020-09-04',
104082: '2020-10-30',
110999: '2020-11-13',
111022: '2020-08-21',
112969: '2020-10-23',
115865: '2020-09-18',
119142: '2020-10-02'},
'actor_id': {1: 100000224549517,
662: 100000168029563,
1719: 100000168029563,
5780: 100000151529577,
8589: 100000151529577,
14569: 100000224549517,
15334: 100000151529577,
19864: 100000151529577,
21641: 100000224549517,
24117: 100000168029563,
33090: 100000168029563,
38253: 100000151529577,
44078: 100000168029563,
53185: 100000168029563,
56337: 100000151529577,
59212: 100000168029563,
63782: 100000151529577,
64287: 100000151529577,
65208: 100000168029563,
65483: 100000151529577,
66264: 100000151529577,
70286: 100000224549517,
76640: 100000224549517,
77288: 100000168029563,
77371: 100000224549517,
87259: 100000168029563,
91926: 100000151529577,
95279: 100000224549517,
104082: 100000168029563,
110999: 100000224549517,
111022: 100000168029563,
112969: 100000224549517,
115865: 100000224549517,
119142: 100000224549517},
'accuracy_score': {1: 0.894736842105263,
662: 0.9523809523809521,
1719: 0.904850746268656,
5780: 0.877551020408163,
8589: 0.7884615384615381,
14569: 0.8571428571428571,
15334: 0.972447325769854,
19864: 0.8536585365853651,
21641: 0.93717277486911,
24117: 0.9,
33090: 0.893617021276595,
38253: 0.8444444444444441,
44078: 1.0,
53185: 0.9047619047619041,
56337: 0.9189189189189191,
59212: 0.957627118644067,
63782: 0.9708737864077671,
64287: 0.8690476190476191,
65208: 0.8983050847457621,
65483: 0.934537246049661,
66264: 0.9487179487179481,
70286: 0.86,
76640: 0.855155482815057,
77288: 0.9302325581395341,
77371: 0.8785607196401791,
87259: 0.875912408759124,
91926: 0.8617021276595741,
95279: 0.8,
104082: 0.9822560202788341,
110999: 1.0,
111022: 0.6666666666666661,
112969: 0.8824101068999021,
115865: 0.8823529411764701,
119142: 0.8478260869565211}}])
Upvotes: 1
Views: 338
Reputation: 473
You coud pivot your data:
df = pd.DataFrame.from_dict({'week_end_date': {1: '2020-09-25',
662: '2020-08-14',
1719: '2020-10-16',
5780: '2020-09-11',
8589: '2020-09-18',
14569: '2020-10-09',
15334: '2020-10-30',
19864: '2020-10-02',
21641: '2020-11-06',
24117: '2020-09-25',
33090: '2020-10-02',
38253: '2020-09-04',
44078: '2020-09-11',
53185: '2020-09-18',
56337: '2020-10-09',
59212: '2020-10-23',
63782: '2020-11-13',
64287: '2020-10-16',
65208: '2020-11-13',
65483: '2020-10-23',
66264: '2020-09-25',
70286: '2020-09-11',
76640: '2020-10-16',
77288: '2020-10-09',
77371: '2020-10-30',
87259: '2020-11-06',
91926: '2020-11-06',
95279: '2020-09-04',
104082: '2020-10-30',
110999: '2020-11-13',
111022: '2020-08-21',
112969: '2020-10-23',
115865: '2020-09-18',
119142: '2020-10-02'},
'actor_id': {1: 100000224549517,
662: 100000168029563,
1719: 100000168029563,
5780: 100000151529577,
8589: 100000151529577,
14569: 100000224549517,
15334: 100000151529577,
19864: 100000151529577,
21641: 100000224549517,
24117: 100000168029563,
33090: 100000168029563,
38253: 100000151529577,
44078: 100000168029563,
53185: 100000168029563,
56337: 100000151529577,
59212: 100000168029563,
63782: 100000151529577,
64287: 100000151529577,
65208: 100000168029563,
65483: 100000151529577,
66264: 100000151529577,
70286: 100000224549517,
76640: 100000224549517,
77288: 100000168029563,
77371: 100000224549517,
87259: 100000168029563,
91926: 100000151529577,
95279: 100000224549517,
104082: 100000168029563,
110999: 100000224549517,
111022: 100000168029563,
112969: 100000224549517,
115865: 100000224549517,
119142: 100000224549517},
'accuracy_score': {1: 0.894736842105263,
662: 0.9523809523809521,
1719: 0.904850746268656,
5780: 0.877551020408163,
8589: 0.7884615384615381,
14569: 0.8571428571428571,
15334: 0.972447325769854,
19864: 0.8536585365853651,
21641: 0.93717277486911,
24117: 0.9,
33090: 0.893617021276595,
38253: 0.8444444444444441,
44078: 1.0,
53185: 0.9047619047619041,
56337: 0.9189189189189191,
59212: 0.957627118644067,
63782: 0.9708737864077671,
64287: 0.8690476190476191,
65208: 0.8983050847457621,
65483: 0.934537246049661,
66264: 0.9487179487179481,
70286: 0.86,
76640: 0.855155482815057,
77288: 0.9302325581395341,
77371: 0.8785607196401791,
87259: 0.875912408759124,
91926: 0.8617021276595741,
95279: 0.8,
104082: 0.9822560202788341,
110999: 1.0,
111022: 0.6666666666666661,
112969: 0.8824101068999021,
115865: 0.8823529411764701,
119142: 0.8478260869565211}})
df.pivot(index="actor_id", columns=['week_end_date'])
Output:
accuracy_score ...
week_end_date 2020-08-14 2020-08-21 ... 2020-11-06 2020-11-13
actor_id ...
100000151529577 NaN NaN ... 0.861702 0.970874
100000168029563 0.952381 0.666667 ... 0.875912 0.898305
100000224549517 NaN NaN ... 0.937173 1.000000
Upvotes: 1