Reputation: 715
I've got four Pandas DataFrames with numerical columns and indices:
A = pd.DataFrame(data={"435000": [9.792, 9.795], "435002": [9.825, 9.812]}, index=[119000, 119002])
B = pd.DataFrame(data={"435004": [9.805, 9.783], "435006": [9.785, 9.78]}, index=[119000, 119002])
C = pd.DataFrame(data={"435000": [9.778, 9.743], "435002": [9.75, 9.743]}, index=[119004, 119006])
D = pd.DataFrame(data={"435004": [9.743, 9.743], "435006": [9.762, 9.738]}, index=[119004, 119006])
I want to concatenate them into one DataFrame like this, matching on both column names and indices:
If I try to pd.concat
the four dfs, they are stacked (either above and below, or to the side, depending on axis
) and I end up with NaN
values in the df:
result = pd.concat([A, B, C, D], axis=0)
How can I use pd.concat
(or merge
, join
etc.) to get the right result?
Upvotes: 14
Views: 26918
Reputation:
You can use join too:
pd.concat((A.join(B), C.join(D)))
Out:
435000 435002 435004 435006
119000 9.792 9.825 9.805 9.785
119002 9.795 9.812 9.783 9.780
119004 9.778 9.750 9.743 9.762
119006 9.743 9.743 9.743 9.738
Upvotes: 6
Reputation: 863801
You need concat in pairs:
result = pd.concat([pd.concat([A, C], axis=0), pd.concat([B, D], axis=0)], axis=1)
print (result)
435000 435002 435004 435006
119000 9.792 9.825 9.805 9.785
119002 9.795 9.812 9.783 9.780
119004 9.778 9.750 9.743 9.762
119006 9.743 9.743 9.743 9.738
Better is stack
+ concat
+ unstack
:
result = pd.concat([A.stack(), B.stack(), C.stack(), D.stack()], axis=0).unstack()
print (result)
435000 435002 435004 435006
119000 9.792 9.825 9.805 9.785
119002 9.795 9.812 9.783 9.780
119004 9.778 9.750 9.743 9.762
119006 9.743 9.743 9.743 9.738
More dynamic:
dfs = [A,B,C,D]
result = pd.concat([df.stack() for df in dfs], axis=0).unstack()
print (result)
435000 435002 435004 435006
119000 9.792 9.825 9.805 9.785
119002 9.795 9.812 9.783 9.780
119004 9.778 9.750 9.743 9.762
119006 9.743 9.743 9.743 9.738
Upvotes: 19