Reputation: 565
I have for example 3 different dataframes, each with different columns, except one column with the same title that exists in all of the 3 dataframes. In this column, I have for example names, and I want to count how many times a name repeats in the different dataframes.
So for example those are the 3 dataframes:
df1:
col1 col2 names col3
a a bbb a
a a ccc a
a a bbb a
df2:
col4 col5 names col6
a a bbb a
a a zzz a
a a qqq a
df3:
col7 col8 names col9
a a zzz a
a a zzz a
a a rrr a
so the output dataframe would be:
names df1 df2 df3 total
bbb V V X 2
ccc V X X 1
zzz X V V 2
qqq X V X 1
rrr X X V 1
So the output dataframe checks which name exists in any other df, and the last column counts how many times it appears.
Is there an easy way with pandas to do this searching and counting, or do I have to switch it into lists and starts iterating through the lists?
Upvotes: 1
Views: 385
Reputation: 862511
You can use concat
+ get_dummies
+ groupby
+ reset_index
+ max
+ T
+ sum
+ replace
:
dfs = [df1, df2, df3]
names = ['df1', 'df2', 'df3']
df = pd.concat([df['names'] for df in dfs], keys=names)
df = pd.get_dummies(df.reset_index(level=1, drop=True)).groupby(level=0).max().T
df['total'] = df.sum(axis=1)
df[names] = df[names].replace({0:'X', 1:'V'})
print (df)
df1 df2 df3 total
bbb V V X 2
ccc V X X 1
qqq X V X 1
rrr X X V 1
zzz X V V 2
Upvotes: 2