Reputation: 2219
I have a pandas dataframe with a number of columns. Some columns are hierarchically groupable. I would like to use this groupability to turn the column structure into a hierarchical structure to be used in a machine learning environment.
Example:
my pandas frame has columns run
, obj_id
and data
and it can look as follows:
Index run obj_id data1 data2
0 0 0 1.3134 3.4943
1 0 0 2.3311 5.4434
2 1 0 1.3345 6.9942
3 1 0 3.4422 3.5353
4 0 1 4.2233 0.3112
and so on. What I would like to do here is first of all train a separate model for each obj_id
. Then I would like to turn the run
into batch, that is, each run
should be seen as a batch. And then the data
columns should be the features.
The result would probably look like this:
X = [ # obj_id: model
[ # run: batch
[ # data_: features
[1.3134, 3.4943],
[2.3311, 5.4434]
],
[
[1.3345, 6.9942],
[3.4422, 3.5353]
]
]
Is there an easy way to do that transformation?
Upvotes: 4
Views: 342
Reputation: 150745
Not the best solution, but you can do:
(df.groupby('obj_id')
.apply(lambda x: x.groupby('run')['data1','data2']
.apply(lambda y: y.values.tolist() )
.to_list()
)
.to_list()
)
Output:
[
[
[
[1.3134, 3.4943],
[2.3311, 5.4434]
],
[
[1.3345, 6.9942],
[3.4422, 3.5353]
]
],
[
[
[4.2233, 0.3112]
]
]
]
Upvotes: 2