luffe
luffe

Reputation: 1665

Unstacking column according to two other columns

I have the following data frame:

df 
     resid      familyid age
54  -0.855952    1028/2  39
55  -0.733972    1028/2  40
56  -0.821011    1028/2  41
57  -0.759187    1028/2  42
58  -0.753066    1028/2  43
65   1.268154    1028/2  35
66   1.214893    1028/2  36
...etc

resid contains N floating point numbers. familyid contains different identifiers (keys) like shown above and age contains 10 values: from 35 to 44. Each familyid is represented twice in this data (i.e they are siblings), and I wish to transform this dataframe to a 2*10 column dataframe where:

The first column contains all the resid values for the first age 35 siblings, the second column contains all resid values for the second age 35 siblings, the third column contains all the resid values for the first age 36 sibling, etc.

I hope this was clear. Thank you

Upvotes: 1

Views: 36

Answers (1)

behzad.nouri
behzad.nouri

Reputation: 77991

one way to do that is to tag each sibling as 1 (first) or 2 (second) and then pivot; for example starting with

>>> df
    resid familyid  age
0   -0.86   1028/2   39
1   -0.73   1028/2   40
2   -0.82   1028/2   41
3   -0.76   1028/2   42
4   -0.75   1028/2   43
5    1.27   1028/2   35
6    1.21   1028/2   36
7    1.71   1028/2   39
8    1.47   1028/2   40
9    1.64   1028/2   41
10   1.52   1028/2   42
11   1.51   1028/2   43
12  -2.54   1028/2   35
13  -2.43   1028/2   36

the tagging part can be done by

>>> tagfn = lambda x: 1 + np.arange(len(x))
>>> df['tag'] = df.groupby(['familyid', 'age']).transform(tagfn)

and then:

>>> df.pivot_table('resid', 'familyid', ['age', 'tag'])
age         35          36          39          40          41          42          43      
tag          1     2     1     2     1     2     1     2     1     2     1     2     1     2
familyid                                                                                    
1028/2    1.27 -2.54  1.21 -2.43 -0.86  1.71 -0.73  1.47 -0.82  1.64 -0.76  1.52 -0.75  1.51

[1 rows x 14 columns]

Upvotes: 2

Related Questions