Reputation: 362
I have two dataframes df1 and df2.
df1 = pd.DataFrame({'type_id' : [1,2,3,4,3], 'count' : [12,11,15,16,2], 'unique_id' : ['1|12','2|11','3|15','4|16','3|2']})
df2 = pd.DataFrame({'type_id' : [1,3,76,12,11,1], 'count' : [8,6,15,16,5,17], 'col3' : [1,5,7,3,4,7], 'unique_id' : ['1|8','3|6','76|12','12|16','11|5','1|17']})
df1
type_id count unique_id
0 1 12 1|12
1 2 11 2|11
2 3 15 3|15
3 4 16 4|16
4 3 2 3|2
df2
type_id count col3 unique_id
0 1 8 1 1|8
1 3 6 5 3|6
2 76 15 7 76|12
3 12 16 3 12|16
4 11 5 4 11|5
5 1 17 7 1|17
Here, unique_id is a combination of type_id and count.
Now, I want to retrieve rows from df2 where df2['type_id'] == df1['id'] & df2['count']<df1['count']
i.e., I want rows where the two ids match and the corresponding count in df2 is less than the count of df1
Expected output:
type_id count col3 unique_id
0 1 8 1 1|8
1 3 6 5 3|6
when I try doing this directly I'm getting an error :
Can only compare identically-labeled Series objects
How do I perform this operation? Thanks in advance.
Upvotes: 4
Views: 799
Reputation: 862611
Use Series.map
for match by id
columns to get Series
with same length like df2
, so possible compare by df2['count']
and filter by boolean indexing
:
df = df2[df2['count'] < df2['type_id'].map(df1.set_index('type_id')['count'])]
print (df)
type_id count col3 unique_id
0 1 8 1 1|8
1 3 6 5 3|6
Detail:
print (df2['type_id'].map(df1.set_index('type_id')['count']))
0 12.0
1 15.0
2 NaN
3 NaN
4 NaN
5 12.0
Name: type_id, dtype: float64
EDIT: Because duplicates in df1['type_id']
is possible create unique unique_id
by chain counter column by GroupBy.cumcount
:
df1 = pd.DataFrame({'type_id' : [1,1,1,4], 'count' : [12,11,15,16]})
df2 = pd.DataFrame({'type_id' : [1,3,76,12,11,1],
'count' : [8,6,15,16,5,17], 'col3' : [1,5,7,3,4,7]})
df1['unique_id1'] = (df1['type_id'].astype(str) + '|' +
df1.groupby('type_id').cumcount().astype(str))
df2['unique_id1'] = (df2['type_id'].astype(str) + '|' +
df2.groupby('type_id').cumcount().astype(str))
print (df1)
type_id count unique_id1
0 1 12 1|0
1 1 11 1|1
2 1 15 1|2
3 4 16 4|0
print (df2)
type_id count col3 unique_id1
0 1 8 1 1|0
1 3 6 5 3|0
2 76 15 7 76|0
3 12 16 3 12|0
4 11 5 4 11|0
5 1 17 7 1|1
df = df2[df2['count'] < df2['unique_id1'].map(df1.set_index('unique_id1')['count'])]
print (df)
type_id count col3 unique_id
0 1 8 1 1|0
Upvotes: 3