Reputation: 709
I have two DataFrames
in pandas
:
dfm_one
data group_a group_b
0 3 a z
1 1 a z
2 2 b x
3 0 b x
4 0 b x
5 1 b z
6 0 c x
7 0 c y
8 3 c z
9 3 c z
dfm_two
data group_a group_b
0 4 a x
1 4 a y
2 4 b x
3 4 b x
4 4 b y
5 1 b y
6 1 b z
7 1 c x
8 4 c y
9 3 c z
10 2 c z
As output I want a boolean column that indicates for dfm_one whether there is a matching data entry (i.e. has the same vale) in dfm_two for each group_a group_b combination.
So my expected output is:
0 False
1 False
2 False
3 False
4 False
5 True
6 False
7 False
8 True
9 True
I'm guessing the code should look something like:
dfm_one.groupby(['group_a','group_b']).apply(lambda x: ??)
and that the function inside apply
should make use of the isin
method.
Another solution might be to merge the two datasets but I think this is not trivial since there is no unique identifier in the DataFrame
.
Upvotes: 1
Views: 83
Reputation: 394099
OK this is a slight hack, if we cast the df to str
dtype then we can call sum
to concatenate the rows into a string, we can use the resultant string as a kind of unique identifier and then call isin
on the other df, again converting to a str
:
In [91]:
dfm_one.astype(str).sum(axis=1).isin(dfm_two.astype(str).sum(axis=1))
Out[91]:
0 False
1 False
2 False
3 False
4 False
5 True
6 False
7 False
8 True
9 True
dtype: bool
Output from the conversions:
In [92]:
dfm_one.astype(str).sum(axis=1)
Out[92]:
0 3az
1 1az
2 2bx
3 0bx
4 0bx
5 1bz
6 0cx
7 0cy
8 3cz
9 3cz
dtype: object
In [93]:
dfm_two.astype(str).sum(axis=1)
Out[93]:
0 4ax
1 4ay
2 4bx
3 4bx
4 4by
5 1by
6 1bz
7 1cx
8 4cy
9 3cz
10 2cz
dtype: object
Upvotes: 1