g_ret3
g_ret3

Reputation: 45

Pandas: find common values across columns

I have the following dataframe:

df = pd.DataFrame({'TX':['bob','tim','frank'],'IL':['fred','bob','tim'],'NE':['tim','joe','bob']})

I would like to isolate the strings that occur across all columns to generate a list. The expected result is:

output = ['tim','bob']

The only way I can think to achieve this is using for loops which I would like to avoid. Is there a built-in pandas function suited to accomplishing this?

Upvotes: 3

Views: 3570

Answers (3)

jezrael
jezrael

Reputation: 862671

You can create mask for count values per columns and test if not missing values per rows by DataFrame.all:

m = df.apply(pd.value_counts).notna()
print (m)
          TX     IL     NE
bob     True   True   True
frank   True  False  False
fred   False   True  False
joe    False  False   True
tim     True   True   True

L = m.index[m.all(axis=1)].tolist()
print (L)
['bob', 'tim']

Upvotes: 8

Umar.H
Umar.H

Reputation: 23099

IIUC,

you can stack all your columns vertically and then do a value_counts to count the occurrences of each item, we'll do that in the variable called s

we then want all occurrences of the names which are equal to the max number of occurrences, in this instance 3, the column values are now indices thanks to using stack

s = df.stack().value_counts()
# or if you want to ignore duplicates column wise
#df.stack().groupby(level=1).unique().explode().value_counts()

print(s)

tim      3
bob      3
frank    1
fred     1
joe      1

s1 = s[s.eq(s.max())].index.tolist()

print(s1)

['tim', 'bob']

Upvotes: 1

Jaroslav Bezděk
Jaroslav Bezděk

Reputation: 7625

You can achieve this by pandas.DataFrame.apply() and set.intersection(), like this:

cols_set = list(df.apply(lambda col: set(col.values)).values)
output = list(set.intersection(*cols_set))

The result is following:

>>> print(output)
['tim', 'bob']

Upvotes: 1

Related Questions