Reputation: 5097
Is there a way to check and sum specific dataframe columns for the same values.
For example in the following dataframe
column name 1, 2, 3, 4, 5
-------------
a, g, h, t, j
b, a, o, a, g
c, j, w, e, q
d, b, d, q, i
when comparing columns 1 and 2 the sum of values that are the same is 2 (a and b)
Thanks
Upvotes: 1
Views: 1994
Reputation: 393963
You can use isin
and sum
to achieve this:
In [96]:
import pandas as pd
import io
t="""1, 2, 3, 4, 5
a, g, h, t, j
b, a, o, a, g
c, j, w, e, q
d, b, d, q, i"""
df = pd.read_csv(io.StringIO(t), sep=',\s+')
df
Out[96]:
1 2 3 4 5
0 a g h t j
1 b a o a g
2 c j w e q
3 d b d q i
In [100]:
df['1'].isin(df['2']).sum()
Out[100]:
2
isin
will produce a boolean series, calling sum on a boolean series converts True
and False
to 1
and 0
respectively:
In [101]:
df['1'].isin(df['2'])
Out[101]:
0 True
1 True
2 False
3 False
Name: 1, dtype: bool
EDIT
To check and count the number of values that are present in all columns of interest the following would work, note that for your dataset there are no values that are present in all columns:
In [123]:
df.ix[:, :'4'].apply(lambda x: x.isin(df['1'])).all(axis=1).sum()
Out[123]:
0
Breaking the above down will show what each step is doing:
In [124]:
df.ix[:, :'4'].apply(lambda x: x.isin(df['1']))
Out[124]:
1 2 3 4
0 True False False False
1 True True False True
2 True False False False
3 True True True False
In [125]:
df.ix[:, :'4'].apply(lambda x: x.isin(df['1'])).all(axis=1)
Out[125]:
0 False
1 False
2 False
3 False
dtype: bool
Upvotes: 2