Reputation: 3086
Given a data frame with various numbers of elements in each group ('groupby' by some variable), I need to reshape into a matrix with predefined number of columns. For example:
summary_x participant_id_x response_date cuts
0 3.0 11 2016-05-05 a
1 3.0 11 2016-05-06 a
2 4.0 11 2016-05-07 a
3 4.0 11 2016-05-08 a
4 3.0 11 2016-05-09 a
5 3.0 11 2016-05-10 a
6 3.0 11 2016-05-11 a
7 3.0 11 2016-05-12 a
8 3.0 11 2016-05-13 a
9 3.0 11 2016-05-14 a
13 4.0 11 2016-05-22 b
14 4.0 11 2016-05-23 b
15 3.0 11 2016-05-24 b
16 3.0 11 2016-05-25 b
17 3.0 11 2016-05-26 b
18 3.0 11 2016-05-27 b
19 3.0 11 2016-05-28 b
20 3.0 11 2016-06-02 c
21 3.0 11 2016-06-03 c
22 3.0 11 2016-06-04 c
23 3.0 11 2016-06-05 c
24 3.0 11 2016-06-06 c
25 3.0 11 2016-06-07 c
26 3.0 11 2016-06-08 c
27 3.0 11 2016-06-09 c
28 3.0 11 2016-06-10 c
29 5.0 11 2016-06-11 c
Each group (by'cuts'), contains 10 elements, but the group 'b' contains only 7. I would like to have a matrix from 'summary_x' reshaped as (3,10), where the missing values will be filled in with nans:
pd.DataFrame(df.summary_x.values.reshape((-1,10)))
0 1 2 3 4 5 6 7 8 9
0 3.0 3.0 4.0 4.0 3.0 3.0 3.0 3.0 3.0 3.0
1 nan nan nan 4.0 4.0 3.0 3.0 3.0 3.0 3.0
2 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 5.0
any solutions?
Upvotes: 1
Views: 471
Reputation: 863166
You can use cumcount
with [::-1]
for change order of columns (rows):
g = df.groupby('cuts').cumcount(ascending=False)
df = pd.pivot(index=df['cuts'], columns=g, values=df['summary_x']).iloc[:,::-1]
.reset_index(drop=True)
df.columns = np.arange(len(df.columns))
print (df)
0 1 2 3 4 5 6 7 8 9
0 3.0 3.0 4.0 4.0 3.0 3.0 3.0 3.0 3.0 3.0
1 NaN NaN NaN 4.0 4.0 3.0 3.0 3.0 3.0 3.0
2 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 5.0
Another solution:
L = df[::-1].groupby('cuts')['summary_x'].apply(list).values.tolist()
df = pd.DataFrame(L).iloc[:, ::-1]
df.columns = np.arange(len(df.columns))
print (df)
0 1 2 3 4 5 6 7 8 9
0 3.0 3.0 4.0 4.0 3.0 3.0 3.0 3.0 3.0 3.0
1 NaN NaN NaN 4.0 4.0 3.0 3.0 3.0 3.0 3.0
2 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 5.0
But if NaN
s can be in the end also:
g = df.groupby('cuts').cumcount()
df = pd.pivot(index=df['cuts'], columns=g, values=df['summary_x']).reset_index(drop=True)
print (df)
0 1 2 3 4 5 6 7 8 9
0 3.0 3.0 4.0 4.0 3.0 3.0 3.0 3.0 3.0 3.0
1 4.0 4.0 3.0 3.0 3.0 3.0 3.0 NaN NaN NaN
2 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 5.0
Upvotes: 1