Reputation: 924
I'm working with a few DataFrames. One of them has some added columns, and some removed columns. The first column of both is a date column, and df2 picks up where the df1 left off, chronologically.
df1:
day alice bob
8/11 0 0
8/25 2 5
9/1 2 0
df2:
day alice charlie
9/12 1 1
9/25 2 3
9/1 2 1
I would like to combine them, to get one DataFrame with all dates and all columns present.
df3:
day alice bob charlie
8/11 0 0 0
8/25 2 5 0
9/1 2 0 0
9/12 1 0 1
9/25 2 0 3
9/1 2 0 1
When I combine my actual 2 DataFrames with pd.concat or pd append, i get the following error:
AssertionError: Number of manager items must equal union of block items
# manager items: 65, # tot_items: 66
Not sure what the issue could be. In the meantime, thanks and you're awesome. I can provide to you a more fleshed out example if necessary.
Upvotes: 1
Views: 700
Reputation: 862681
Solution is correct:
df = pd.concat([df1,df2]).fillna(0)
But here is error:
AssertionError: Number of manager items must equal union of block items # manager items: 65, # tot_items: 66
It means there are duplicated columns names, you can verify it:
print (df1.loc[:, df1.columns.duplicated(keep=False)])
print (df2.loc[:, df2.columns.duplicated(keep=False)])
If same values in columns is possible remove duplicated by:
df1 = df1.loc[:, ~df1.columns.duplicated()]
df2 = df2.loc[:, ~df2.columns.duplicated()]
df = pd.concat([df1,df2]).fillna(0)
Upvotes: 2
Reputation: 122
In order to get a DataFrame with all dates and columns present, you can use DataFrame.combine_first.
import pandas as pd
import numpy as np
# Create df1
df1 = pd.DataFrame({
'day': ['8/11', '8/25', '9/1'],
'alice': [0, 2, 2],
'bob': [0, 5, 0]
}).set_index('day')
# Create df2
df2 = pd.DataFrame({
'day': ['9/12', '9/25', '9/1'],
'alice': [1, 2, 2],
'charlie': [1, 3, 1]
}).set_index('day')
# Do combine_first
df = df2.combine_first(df1).fillna(0)
df1
alice bob
day
8/11 0 0
8/25 2 5
9/1 2 0
df2
alice charlie
day
9/12 1 1
9/25 2 3
9/1 2 1
alice bob charlie
day
8/11 0.0 0.0 0.0
8/25 2.0 5.0 0.0
9/1 2.0 0.0 1.0
9/12 1.0 0.0 1.0
9/25 2.0 0.0 3.0
Upvotes: 0
Reputation: 1939
df1['charlie']=0
df2['bob']=0
df=pd.concat([df1,df2]).reset_index().drop(['index'],axis=1)
Explanation: You can add the columns missing in the respective data frames. Now, if you concat(), no error will be raised.
Upvotes: 0
Reputation: 4864
I think this will work:
df3 = pd.concat([df1.set_index("day"), df2.set_index("day")]).fillna(value=0).reset_index()
Upvotes: 0