Reputation: 1145
I have a main dataframe df1 which is 'empty' and some other dataframes of different length but with same columns as df1. I would like to merge/join/concat df2 and df3 on certain index positions of df1:
df1 = pd.DataFrame(index=(np.arange(20)), columns=['A', 'B', 'C', 'D'])
df2 = pd.DataFrame(randn(5, 4), columns=['A', 'B', 'C', 'D'])
df3 = pd.DataFrame(randn(2, 4), columns=['A', 'B', 'C', 'D'])
df4 = pd.concat([df1, df2, df3],axis=0, keys=['','df2','df3'])
However, concat([df1,df2,df3], axis=0, keys....) puts the dfs consecutively together...
As an example df2 should start at df1.index(3) and df3 at df1.index(12) leaving the NANs of df1 before, between and after df2 and df3 in place.
Upvotes: 0
Views: 1153
Reputation: 394041
You won't be able to achieve what you want using concat
like this or merge for that matter without reindexing. By default for concat
the original index values will be used and will just stack the dfs after each other. If you tried to merge using the indices then they will clash and create additonal columns e.g. 'X_x, 'X_y' etc..
The easiest way is to reindex the dfs and then call update to overwrite the NaNs at the desired rows:
In [55]:
df1 = pd.DataFrame(index=(np.arange(20)), columns=['A', 'B', 'C', 'D'])
df2 = pd.DataFrame(randn(5, 4), columns=['A', 'B', 'C', 'D'], index=arange(3,8))
df3 = pd.DataFrame(randn(2, 4), columns=['A', 'B', 'C', 'D'], index=arange(12,14))
df1.update(df2 )
df1.update(df3)
df1
Out[55]:
A B C D
0 NaN NaN NaN NaN
1 NaN NaN NaN NaN
2 NaN NaN NaN NaN
3 1.315786 -0.4132373 1.084278 -0.945918
4 2.663868 0.8093393 -0.9853748 -1.428507
5 0.1414072 0.1896291 1.562247 0.1180398
6 1.699411 1.518535 -0.1854568 -0.430911
7 -1.716514 1.053428 0.3587436 1.471504
8 NaN NaN NaN NaN
9 NaN NaN NaN NaN
10 NaN NaN NaN NaN
11 NaN NaN NaN NaN
12 -0.5668472 -0.137524 0.1769178 0.7065836
13 -1.726548 0.2645793 0.1484438 -1.099373
14 NaN NaN NaN NaN
15 NaN NaN NaN NaN
16 NaN NaN NaN NaN
17 NaN NaN NaN NaN
18 NaN NaN NaN NaN
19 NaN NaN NaN NaN
So you can modify the index after creation like this:
In [56]:
df2 = pd.DataFrame(randn(5, 4), columns=['A', 'B', 'C', 'D'])
print(df2)
df2.index = index=arange(3,8)
df2
A B C D
0 -0.417199 0.184960 -1.056915 0.674005
1 0.592654 1.131421 -0.958991 0.831648
2 -1.087500 0.664596 -1.508290 2.453404
3 1.441886 -0.131998 0.040513 -2.675875
4 1.547340 -0.230203 1.335917 -1.568939
Out[56]:
A B C D
3 -0.417199 0.184960 -1.056915 0.674005
4 0.592654 1.131421 -0.958991 0.831648
5 -1.087500 0.664596 -1.508290 2.453404
6 1.441886 -0.131998 0.040513 -2.675875
7 1.547340 -0.230203 1.335917 -1.568939
Upvotes: 1