Reputation: 109
I want to be able to turn a groupby into a single row, but the values of a second column in that groupby to be aggregated into new columns or -99, if there isn't sufficient data.
After we group by session_id with this input:
user_id session_id timestamp step impressions n_clicks
0 004A07DM0IDW 1d688ec168932 1541555799 7 2059240 5.0
1 004A07DM0IDW 1d688ec168932 1541555799 7 2033381 3.0
2 004A07DM0IDW 1d688ec168932 1541555799 7 1724779 4.0
3 004A07DM0IDW 1d688ec168932 1541555799 7 127131 2.0
4 004A07DM0IDW 1d688ec168932 1541555799 7 399441 1.0
5 004A07DM0IDW 1d688ec168932 1541555799 7 103357 3.0
6 004A07DM0IDW 1d688ec168932 1541555799 7 127132 3.0
7 004A07DM0IDW 1d688ec168932 1541555799 7 1167004 1.0
8 004A07DM0IDW 1d688ec168932 1541555799 7 4491766 4.0
9 004A07DM0IDW 1d688ec168932 1541555799 7 2249874 5.0
10 00Y1Z24X8084 26b6d294d66e7 1541651823 3 4476010 4.0
11 00Y1Z24X8084 26b6d294d66e7 1541651823 3 3843244 5.0
I'd like to produce this output
user_id session_id timestamp step count_0 count_1 count_2 count... count_24
0 004A07DM0IDW 1d688ec168932 1541555799 7 5.0 3.0 4.0 2.0 -99
1 00Y1Z24X8084 26b6d294d66e7 1541555799 3 4.0 5.0 -99 -99 -99
What we're looking is that the user_id
session_id
timestamp
step
will always be the same for each row. However, the impressions are different. For each of the rows (there will be max 25), the value in the click column is mapped onto a count_x
, however, if the number of rows is not sufficient, the subsequent values will take -99.
Since there are 10 rows in the first groupby frame, it means that columns count_10
to count_24
will have -99 as the value. For the second groupby frame columns count_2
to count_24
will have -99.
Upvotes: 1
Views: 348
Reputation: 862691
Use:
cols = ['user_id','session_id','timestamp','step']
df['g'] = df.groupby(cols).cumcount()
df = (df.set_index(cols + ['g'])['n_clicks']
.unstack(fill_value=-99)
.reindex(range(25), fill_value=-99, axis=1)
.add_prefix('count_')
.reset_index()
.rename_axis(None, axis=1))
print (df)
user_id session_id timestamp step count_0 count_1 count_2 \
0 004A07DM0IDW 1d688ec168932 1541555799 7 5.0 3.0 4.0
1 00Y1Z24X8084 26b6d294d66e7 1541651823 3 4.0 5.0 -99.0
count_3 count_4 count_5 ... count_15 count_16 count_17 count_18 \
0 2.0 1.0 3.0 ... -99 -99 -99 -99
1 -99.0 -99.0 -99.0 ... -99 -99 -99 -99
count_19 count_20 count_21 count_22 count_23 count_24
0 -99 -99 -99 -99 -99 -99
1 -99 -99 -99 -99 -99 -99
[2 rows x 29 columns]
Explanation:
GroupBy.cumcount
for counterDataFrame.set_index
and reshape by Series.unstack
range(25)
by DataFrame.reindex
DataFrame.add_prefix
DataFrame.rename_axis
with
DataFrame.reset_index
Upvotes: 3