Hendrik Wiese
Hendrik Wiese

Reputation: 2219

pandas dataframe columns to hierarchical data structure?

I have a pandas dataframe with a number of columns. Some columns are hierarchically groupable. I would like to use this groupability to turn the column structure into a hierarchical structure to be used in a machine learning environment.

Example:

my pandas frame has columns run, obj_id and data and it can look as follows:

Index    run    obj_id    data1    data2
0        0      0         1.3134   3.4943
1        0      0         2.3311   5.4434
2        1      0         1.3345   6.9942
3        1      0         3.4422   3.5353
4        0      1         4.2233   0.3112

and so on. What I would like to do here is first of all train a separate model for each obj_id. Then I would like to turn the run into batch, that is, each run should be seen as a batch. And then the data columns should be the features.

The result would probably look like this:

X = [ # obj_id: model
      [ # run: batch
        [ # data_: features
          [1.3134, 3.4943], 
          [2.3311, 5.4434]
        ], 
        [
          [1.3345, 6.9942], 
          [3.4422, 3.5353]
        ]
      ]

Is there an easy way to do that transformation?

Upvotes: 4

Views: 342

Answers (1)

Quang Hoang
Quang Hoang

Reputation: 150745

Not the best solution, but you can do:

(df.groupby('obj_id')
   .apply(lambda x: x.groupby('run')['data1','data2']
                     .apply(lambda y: y.values.tolist() )
                     .to_list()
         )
   .to_list()
)

Output:

[
    [
        [
            [1.3134, 3.4943], 
            [2.3311, 5.4434]
        ], 
        [
            [1.3345, 6.9942], 
            [3.4422, 3.5353]
        ]
    ],
    [
        [
            [4.2233, 0.3112]
        ]
    ]
]

Upvotes: 2

Related Questions