Reputation: 610
I have two dataframes df1 and df2 with key
as index.
dict_1={'key':[1,1,1,2,2,3], 'col1':['a1','b1','c1','d1','e1','f1']}
df1 = pd.DataFrame(dict_1).set_index('key')
dict_2={'key':[1,1,2], 'col2':['a2','b2','c2']}
df2 = pd.DataFrame(dict_2).set_index('key')
df1:
col1
key
1 a1
1 b1
1 c1
2 d1
2 e1
3 f1
df2
col2
key
1 a2
1 b2
2 c2
Note that there are unequal rows for each index. I want to concatenate these two dataframes such that, I have the following dataframe (say df3).
df3
col1 col2
key
1 a1 a2
1 b1 b2
2 d1 c2
i.e. concatenate the two columns so that the new dataframe as the least (of df1 and df2) rows for each index.
I tried
pd.concat([df1,df2],axis=1)
but I get the following error:
Value Error: Shape of passed values is (2,17), indices imply (2,7)
My question: How can I concatentate df1
and df2
to get df3
? Should I use DataFrame.merge
instead? If so, how?
Upvotes: 2
Views: 751
Reputation: 5222
Merge/join alone will get you a lot of (hard to get rid of) duplicates. But a little trick will help:
df1['count1'] = 1
df1['count1'] = df1['count1'].groupby(df1.index).cumsum()
df1
Out[198]:
col1 count1
key
1 a1 1
1 b1 2
1 c1 3
2 d1 1
2 e1 2
3 f1 1
The same thing for df2
:
df2['count2'] = 1
df2['count2'] = df2['count2'].groupby(df2.index).cumsum()
And finally:
df_aligned = df1.reset_index().merge(df2.reset_index(), left_on = ['key','count1'], right_on = ['key', 'count2'])
df_aligned
Out[199]:
key col1 count1 col2 count2
0 1 a1 1 a2 1
1 1 b1 2 b2 2
2 2 d1 1 c2 1
Now, you can reset index with set_index('key')
and drop no longer needed columns countn
.
Upvotes: 2
Reputation: 5591
The biggest problem for why you are not going to be able to line up the two in the way that you want is that your keys are duplicative. How are you going to be line up the A1 value in df1
with the A2 value in df2
When A1, A2, B1, B2, and C1 all have the same key?
Using merge is what you'll want if you can resolve the key issues:
df3 = df1.merge(df2, left_index=True, right_index=True, how='inner')
You can use inner
, outer
, left
or right
for how
.
Upvotes: 0