Reputation: 1665
I have the following data frame:
df
resid familyid age
54 -0.855952 1028/2 39
55 -0.733972 1028/2 40
56 -0.821011 1028/2 41
57 -0.759187 1028/2 42
58 -0.753066 1028/2 43
65 1.268154 1028/2 35
66 1.214893 1028/2 36
...etc
resid contains N floating point numbers. familyid contains different identifiers (keys) like shown above and age contains 10 values: from 35 to 44. Each familyid is represented twice in this data (i.e they are siblings), and I wish to transform this dataframe to a 2*10 column dataframe where:
The first column contains all the resid values for the first age 35 siblings, the second column contains all resid values for the second age 35 siblings, the third column contains all the resid values for the first age 36 sibling, etc.
I hope this was clear. Thank you
Upvotes: 1
Views: 36
Reputation: 77991
one way to do that is to tag
each sibling as 1
(first) or 2
(second) and then pivot; for example starting with
>>> df
resid familyid age
0 -0.86 1028/2 39
1 -0.73 1028/2 40
2 -0.82 1028/2 41
3 -0.76 1028/2 42
4 -0.75 1028/2 43
5 1.27 1028/2 35
6 1.21 1028/2 36
7 1.71 1028/2 39
8 1.47 1028/2 40
9 1.64 1028/2 41
10 1.52 1028/2 42
11 1.51 1028/2 43
12 -2.54 1028/2 35
13 -2.43 1028/2 36
the tagging part can be done by
>>> tagfn = lambda x: 1 + np.arange(len(x))
>>> df['tag'] = df.groupby(['familyid', 'age']).transform(tagfn)
and then:
>>> df.pivot_table('resid', 'familyid', ['age', 'tag'])
age 35 36 39 40 41 42 43
tag 1 2 1 2 1 2 1 2 1 2 1 2 1 2
familyid
1028/2 1.27 -2.54 1.21 -2.43 -0.86 1.71 -0.73 1.47 -0.82 1.64 -0.76 1.52 -0.75 1.51
[1 rows x 14 columns]
Upvotes: 2