Reputation: 1946
How can I find non-matching values, by date, from two separate dataframes?
My dataframes look like:
import pandas as pd
data_1 = {'date':['2019-05-01','2019-05-02'],
'id': ['1122','1133']}
data_2 = {'id': ['1122','1133','1144']}
df1 = pd.DataFrame(data_1, columns=['date','id'])
df2 = pd.DataFrame(data_2, columns=['id'])
The result I need have is a dataframe that contains any id from df2 that doesn't match df1 and the date from df1.
So the results should look like;
date id
2019-05-01 1133
2019-05-01 1144
2019-05-02 1122
2019-05-02 1144
Upvotes: 1
Views: 63
Reputation: 294258
append
and drop_duplicates
Create a starting dataframe consisting of a product of dates from df1
and ids from df2
d = pd.DataFrame([
[d, i] for d in df1.date
for i in df2.id
], columns=df1.columns)
d.append(df1).drop_duplicates(keep=False)
date id
1 2019-05-01 1133
2 2019-05-01 1144
3 2019-05-02 1122
5 2019-05-02 1144
tups = {*zip(*map(df1.get, df1))}
pd.DataFrame([
[d, i] for d in df1.date
for i in df2.id if (d, i) not in tups
], columns=df1.columns)
date id
0 2019-05-01 1133
1 2019-05-01 1144
2 2019-05-02 1122
3 2019-05-02 1144
itertools.product
and some set
logicMy favorite by the way!
from itertools import product
pd.DataFrame(
{*product(df1.date, df2.id)} - {*zip(*map(df1.get, df1))},
columns=df1.columns
)
date id
0 2019-05-01 1144
1 2019-05-01 1133
2 2019-05-02 1144
3 2019-05-02 1122
Upvotes: 1
Reputation: 323236
This is more like CJ(cross join) problem with filter
df1.assign(key=1).merge(df2.assign(key=1),on='key').loc[lambda x : x['id_x']!=x['id_y']].drop(['key','id_x'],1)
Out[262]:
date id_y
1 2019-05-01 1133
2 2019-05-01 1144
3 2019-05-02 1122
5 2019-05-02 1144
Upvotes: 1
Reputation: 150745
Try:
# first we change `df1.id` to columns by crosstab:
u = pd.crosstab(df1.date, df1.id)
# extend the id with df2.id
u = u.reindex(df2['id'], axis=1, fill_value=0).stack()
Then what you are looking for is
u[u.eq(0)].index.to_frame().reset_index(drop=True)
Output:
date id
0 2019-05-01 1133
1 2019-05-01 1144
2 2019-05-02 1122
3 2019-05-02 1144
Upvotes: 0