Reputation: 1285
Given 2 dataframes, how can I append only the unique rows to the main df from the second df?
Example, given these two dataframes:
...how can I end up with this result?:
I would like to involve the index somehow as my application will be using datetimeindex's. A reproducible code, and my attempt at concatenation is below:
import pandas as pd
df1 = pd.DataFrame(
{
"A": ["A0", "A1", "A2", "A3"],
"B": ["B0", "B1", "B2", "B3"],
"C": ["C0", "C1", "C2", "C3"],
"D": ["D0", "D1", "D2", "D3"],
},
index=[0, 1, 2, 3],
)
print(df1)
print()
df2 = pd.DataFrame(
{
"A": ["A2", "A3", "A4", "A5"],
"B": ["B2", "B3", "B4", "B5"],
"C": ["C2", "C3", "C4", "C5"],
"D": ["D2", "D3", "D4", "D5"],
},
index=[2, 3, 4, 5],
)
print(df2)
print()
result = pd.concat([df1, df2], join="inner", ignore_index=False)
print(result)
Upvotes: 0
Views: 876
Reputation: 323326
Just do merge
in your case
out = df1.merge(df2,how='outer')
A B C D
0 A0 B0 C0 D0
1 A1 B1 C1 D1
2 A2 B2 C2 D2
3 A3 B3 C3 D3
4 A4 B4 C4 D4
5 A5 B5 C5 D5
Upvotes: 1
Reputation: 336
After concatenation, you can drop the duplicates using drop_duplicate() function.
import pandas as pd
df1 = pd.DataFrame(
{
"A": ["A0", "A1", "A2", "A3"],
"B": ["B0", "B1", "B2", "B3"],
"C": ["C0", "C1", "C2", "C3"],
"D": ["D0", "D1", "D2", "D3"],
},
index=[0, 1, 2, 3],
)
print(df1)
print()
df2 = pd.DataFrame(
{
"A": ["A2", "A3", "A4", "A5"],
"B": ["B2", "B3", "B4", "B5"],
"C": ["C2", "C3", "C4", "C5"],
"D": ["D2", "D3", "D4", "D5"],
},
index=[2, 3, 4, 5],
)
print(df2)
print()
result = pd.concat([df1, df2], join="inner", ignore_index=False)
result = result.drop_duplicates()
print(result)
https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.drop_duplicates.html
Upvotes: 1