Reputation: 38649
Let's say I have these two dataframes:
>>> import pandas as pd
>>> df1 = pd.DataFrame({"key":[1,3,5], "columnA":[30,40,50]})
>>> df1
key columnA
0 1 30
1 3 40
2 5 50
>>> df2 = pd.DataFrame({"key":[2,4], "columnB":[60,70]})
>>> df2
key columnB
0 2 60
1 4 70
I basically want a new dataframe, with "key", "columnA", and "columnB", where the corresponding data is "interleaved" from the two above dataframes, correspondingly. I did this:
>>> pd.merge(df1, df2, on='key', how='outer').astype('Int64')
key columnA columnB
0 1 30 <NA>
1 3 40 <NA>
2 5 50 <NA>
3 2 <NA> 60
4 4 <NA> 70
... which comes close - but I want the output to be:
key columnA columnB
0 1 30 <NA>
1 2 <NA> 60
2 3 40 <NA>
3 4 <NA> 70
4 5 50 <NA>
How can I achieve that?
Upvotes: 0
Views: 103
Reputation: 34076
You can use sort_values
and then reset_index
to achieve the expected output.
In [778]: pd.merge(df1, df2, on='key', how='outer').astype('Int64').sort_values('key').reset_index().drop('index',1)
Out[778]:
key columnA columnB
0 1 30 <NA>
1 2 <NA> 60
2 3 40 <NA>
3 4 <NA> 70
4 5 50 <NA>
Or you can pass ignore_index=True
in the sort_values
parameter itself:
In [795]: pd.merge(df1, df2, on='key', how='outer').astype('Int64').sort_values('key', ignore_index=True)
Out[795]:
key columnA columnB
0 1 30 <NA>
1 2 <NA> 60
2 3 40 <NA>
3 4 <NA> 70
4 5 50 <NA>
Upvotes: 3
Reputation: 862851
Use DataFrame.sort_values
with DataFrame.reset_index
and drop=True
parameter:
df = (pd.merge(df1, df2, on='key', how='outer')
.astype('Int64')
.sort_values('key')
.reset_index(drop=True))
print (df)
key columnA columnB
0 1 30 NaN
1 2 NaN 60
2 3 40 NaN
3 4 NaN 70
4 5 50 NaN
Upvotes: 1