Reputation: 111
How can I concatenate a dictionary of dataframes (index of each dataframe starts with 0. So, there are duplicate indices) with different column sizes. i.e. Few df's have some missing columns and some df's contain all the columns.
Here is an example:
df1
idx col1 col2 col3
0 1 1 1
1 2 2 2
df2
idx col1 col3
0 1 1
1 2 2
df3
idx col1 col2 col3
0 1 1 1
1 2 2 2
desired_result
idx col1 col2 col3
0 1 1 1
1 2 2 2
2 1 nan 1
3 2 nan 2
4 1 1 1
5 2 2 2
So I would like to fill these columns with Nans and reshape it to the dataframe having all columns.
Thanks in advance !
Upvotes: 0
Views: 224
Reputation: 13582
Considering that the dictionary of dataframes looks like the following
df_dict = {
'df1': pd.DataFrame({'idx': [1, 2], 'col1': [1, 2], 'col2': [1, 2], 'col3': [1, 2]}),
'df2': pd.DataFrame({'idx': [1, 2], 'col1': [1, 2], 'col3': [1, 2]}),
'df3': pd.DataFrame({'idx': [1, 2], 'col1': [1, 2], 'col2': [1, 2], 'col3': [1, 2]})
}
One can simply use pandas.concat
as follows
df = pd.concat(df_dict.values(), ignore_index=True)
[Out]:
idx col1 col2 col3
0 1 1 1.0 1
1 2 2 2.0 2
2 1 1 NaN 1
3 2 2 NaN 2
4 1 1 1.0 1
5 2 2 2.0 2
If one wants the idx
column to be the index, one can pass pandas.DataFrame.set_index
as follows
df = pd.concat(df_dict.values(), ignore_index=True).set_index('idx')
[Out]:
col1 col2 col3
idx
1 1 1.0 1
2 2 2.0 2
1 1 NaN 1
2 2 NaN 2
1 1 1.0 1
2 2 2.0 2
Notes:
If one doesn't want col2
to be float64
(following this user's comment), one approach would be to use pandas.Int64Dtype
as follows
df['col2'] = df['col2'].astype(pd.Int64Dtype())
[Out]:
idx col1 col2 col3
0 1 1 1 1
1 2 2 2 2
2 1 1 <NA> 1
3 2 2 <NA> 2
4 1 1 1 1
5 2 2 2 2
Upvotes: 1