Python Pandas match dataframes

Question

I have for example 3 different dataframes, each with different columns, except one column with the same title that exists in all of the 3 dataframes. In this column, I have for example names, and I want to count how many times a name repeats in the different dataframes.

So for example those are the 3 dataframes:

df1:

col1   col2   names   col3
 a       a     bbb      a
 a       a     ccc      a
 a       a     bbb      a

df2:

col4   col5   names   col6
 a       a     bbb      a
 a       a     zzz      a
 a       a     qqq      a

df3:

col7   col8   names   col9
 a       a     zzz      a
 a       a     zzz      a
 a       a     rrr      a

so the output dataframe would be:

names    df1    df2    df3   total
 bbb      V      V      X      2
 ccc      V      X      X      1
 zzz      X      V      V      2
 qqq      X      V      X      1
 rrr      X      X      V      1

So the output dataframe checks which name exists in any other df, and the last column counts how many times it appears.

Is there an easy way with pandas to do this searching and counting, or do I have to switch it into lists and starts iterating through the lists?

jezrael · Accepted Answer

You can use concat + get_dummies + groupby + reset_index + max + T + sum + replace:

dfs = [df1, df2, df3]
names = ['df1', 'df2', 'df3']
df = pd.concat([df['names'] for df in dfs], keys=names)
df = pd.get_dummies(df.reset_index(level=1, drop=True)).groupby(level=0).max().T
df['total'] = df.sum(axis=1)
df[names] = df[names].replace({0:'X', 1:'V'})
print (df)
    df1 df2 df3  total
bbb   V   V   X      2
ccc   V   X   X      1
qqq   X   V   X      1
rrr   X   X   V      1
zzz   X   V   V      2

Python Pandas match dataframes

Answers (1)

Related Questions