Chinmay Jape
Chinmay Jape

Reputation: 29

Pandas Dataframe from a dictionary

I have a dictionary in the below format

{'X_tr': {'school_state':            col1      col2
  0      0.009099  0.047694
  1      0.004304  0.024660
  2      0.003129  0.019796
  3      0.002541  0.018430
  4      0.009099  0.047694
  ...         ...       ...
  73191  0.013457  0.055167
  73192  0.001530  0.009481
  73193  0.002869  0.015657
  73194  0.002869  0.015657
  73195  0.002118  0.013102
  
  [73196 rows x 2 columns], 'clean_categories':            col1      col2
  0      0.028526  0.188139
  1      0.000478  0.002049
  2      0.007487  0.031532
  3      0.017474  0.115648
  4      0.000997  0.004522

I have data from training set and test set i.e. first key (X_tr and X_test). Then there are categorical variables like 'school_state', 'clean_categories' etc.

I want to create a dataframes similar to the one below for each category:

School State
Index  col1      col2
 0    .009099  .047694
 1    .004304  .024660
......................
......................

clean_categories
Index col1 col2
..............
..............

Since it is a nested dictionary I am facing issues in performing this operation.

Can someone please suggest a workaround.

Upvotes: 0

Views: 83

Answers (1)

Vaebhav
Vaebhav

Reputation: 5032

You can create multiple DataFrames or use concat to create a single DataFrame

The idea is to iterate the keys from the dictionary

df_dict = {'X_tr': {'school_state':            col1      col2
  0      0.009099  0.047694
  1      0.004304  0.024660
  2      0.003129  0.019796
  3      0.002541  0.018430
  4      0.009099  0.047694
  ...         ...       ...
  73191  0.013457  0.055167
  73192  0.001530  0.009481
  73193  0.002869  0.015657
  73194  0.002869  0.015657
  73195  0.002118  0.013102
  
  [73196 rows x 2 columns], 'clean_categories':            col1      col2
  0      0.028526  0.188139
  1      0.000478  0.002049
  2      0.007487  0.031532
  3      0.017474  0.115648
  4      0.000997  0.004522

Individual DataFrames for test and train

school_state_tr = df_dict['X_tr']['school_state']
clean_categories_tr = df_dict['X_tr']['clean_categories']

school_state_ts = df_dict['X_test']['school_state']
clean_categories_ts = df_dict['X_test']['clean_categories']

Single DataFrame based on categories

school_state_tr['flag'] = 'Train'
clean_categories_tr['flag'] = 'Train'

school_state_ts['flag'] = 'Test'
clean_categories_ts['flag'] = 'Test'

school_state = pd.concat([school_state_tr,school_state_ts],axis=0)
clean_categories = pd.concat([clean_categories_tr,clean_categories_ts],axis=0)

Upvotes: 1

Related Questions