Stacey
Stacey

Reputation: 5097

pandas dataframe check specific columns for same values

Is there a way to check and sum specific dataframe columns for the same values.

For example in the following dataframe

column name 1, 2, 3, 4, 5
            -------------
            a, g, h, t, j 
            b, a, o, a, g
            c, j, w, e, q
            d, b, d, q, i

when comparing columns 1 and 2 the sum of values that are the same is 2 (a and b)

Thanks

Upvotes: 1

Views: 1994

Answers (1)

EdChum
EdChum

Reputation: 393963

You can use isin and sum to achieve this:

In [96]:
import pandas as pd
import io
t="""1, 2, 3, 4, 5
a, g, h, t, j 
b, a, o, a, g
c, j, w, e, q
d, b, d, q, i"""
df = pd.read_csv(io.StringIO(t), sep=',\s+')
df

Out[96]:
   1  2  3  4  5
0  a  g  h  t  j
1  b  a  o  a  g
2  c  j  w  e  q
3  d  b  d  q  i

In [100]:    
df['1'].isin(df['2']).sum()

Out[100]:
2

isin will produce a boolean series, calling sum on a boolean series converts True and False to 1 and 0 respectively:

In [101]:
df['1'].isin(df['2'])

Out[101]:
0     True
1     True
2    False
3    False
Name: 1, dtype: bool

EDIT

To check and count the number of values that are present in all columns of interest the following would work, note that for your dataset there are no values that are present in all columns:

In [123]:
df.ix[:, :'4'].apply(lambda x: x.isin(df['1'])).all(axis=1).sum()

Out[123]:
0

Breaking the above down will show what each step is doing:

In [124]:    
df.ix[:, :'4'].apply(lambda x: x.isin(df['1']))

Out[124]:
      1      2      3      4
0  True  False  False  False
1  True   True  False   True
2  True  False  False  False
3  True   True   True  False

In [125]:    
df.ix[:, :'4'].apply(lambda x: x.isin(df['1'])).all(axis=1)

Out[125]:
0    False
1    False
2    False
3    False
dtype: bool

Upvotes: 2

Related Questions