NotAName
NotAName

Reputation: 4322

Need a more efficient way of creating a dictionary of dataframes from a single large dataframe

So the issues is that I have a large dataframe (some millions of rows) and I need to split it into separate dfs based on a value of metric (which can have several thousand unique values in the df) and then put all individual dfs into a dictionary.

The data looks like this:

>>> df.sample(20)
Out[104]: 
                        time       mhi                 metric
1953310  2020-09-26 09:57:59  0.364575   100004_uf7-15_l14-40
5748967  2020-11-15 14:50:27  0.430073  100004_uf11-15_l10-45
3124709  2020-10-17 23:32:50  1.000000   100004_uf5-21_l26-40
2201278  2020-10-01 12:30:26  0.020645  100004_uf09-27_l26-46
5515393  2020-11-14 03:48:50  1.000000   100004_uf9-18_l26-35
1813859  2020-09-25 00:48:42  0.572557   100004_uf7-24_l10-40
1656151  2020-09-24 00:39:28  0.673656  100004_uf07-24_l32-42
4796411  2020-11-10 09:21:54  1.000000   100004_uf5-15_l22-30
92122    2020-07-06 07:20:37  1.000000   100004_uf5-21_l26-30
3690550  2020-10-25 23:40:57  0.268361  100004_uf09-18_l28-42
4946382  2020-11-11 01:58:22  1.000000   100004_uf5-18_l22-35
3899731  2020-11-01 11:48:08  1.000000   100004_uf7-15_l22-30
5996972  2020-11-17 10:55:22  1.000000  100004_uf07-21_l32-42
7471727  2021-01-01 11:52:45  1.000000  100004_uf07-27_l30-42
3669036  2020-10-25 20:10:33  1.000000   100004_uf5-21_l10-35
1166225  2020-09-17 11:58:21  1.000000   100004_uf7-15_l22-30
5832113  2020-11-16 02:52:32  0.349082  100004_uf07-21_l28-54
1458903  2020-09-21 21:04:32  0.524897  100004_uf07-18_l30-42
3094785  2020-10-17 15:46:02  1.000000   100004_uf5-24_l18-30
674615   2020-08-05 02:31:14  0.401657  100004_uf11-18_l34-46

What I'm currently doing is this:

versions = df.metric.unique()
mhi_dict = {ver: df.loc[df.metric == ver] for ver in versions}

Yet this is proving to be very time consuming, takes over 5 minutes on an average for ~1500 unique versions. Is there a way to speed it up somehow?

Upvotes: 0

Views: 120

Answers (1)

Agyey Arya
Agyey Arya

Reputation: 240

df_grouped = df.groupby('metric')
mhi_dict = {}
for key in df_grouped.groups:
  group = df_grouped.get_group(key)
  mhi_dict[key] = group

Upvotes: 1

Related Questions