Reputation: 41
I'm a newbie to pandas, and suspect this has a simple answer, but it's stumping me right now. I want to select the rows from multiple dataframes (with same columns) where a value in one column occurs in all of the dataframes.
So if I have the following:
import numpy as np
import pandas as pd
df1 = pd.DataFrame({'Col1'['Adams','Baker','Cash','Jones','Thomas'],\
'Col2': ['A','B','C','D','E'],'Col3':[10,11,12,13,14]})
df2 = pd.DataFrame({'Col1':['Adams','Cash','Fox','Jones','Johnson'],\
'Col2': ['D','E','F','G','H'],'Col3':[40,50,60,70,80]})
df3 = pd.DataFrame({'Col1': ['Adams','Barns','Jones','Smith','Thomas'],\
'Col2':['F','G','J','L','M'],'Col3':[21,21,22,23,24]})
print df1
print df2
print df3
Giving:
Col 1 Col2 Col3
0 Adams A 10
1 Baker B 11
2 Cash C 12
3 Jones D 13
4 Thomas E 14
Col1 Col2 Col3
0 Adams D 40
1 Cash E 50
2 Fox F 60
3 Jones G 70
4 Johnson H 80
Col1 Col2 Col3
0 Adams F 21
1 Barns G 21
2 Jones J 22
3 Smith L 23
4 Thomas M 24
I want to end up with:
Adams A 10
Adams D 40
Adams F 21
Jones D 13
Jones G 70
Jones J 22
Is there a succinct way to do this?
Upvotes: 4
Views: 7946
Reputation: 109546
First, create a container to hold your dataframes (dfs
). You can use reduce
on the set intersection of names in Col1
to find names in all of the dataframes. Lastly, you can use query
together with concat to join the relevant rows.
dfs = (df1, df2, df3)
common_names = \
reduce(lambda s, df: s.intersection(set(df.Col1)), # The set intersection of names.
dfs[1:], # Sequence of dataframes to pass through reduce starting with second.
set(dfs[0].Col1)) # Seed reduce with the column of names from first dataframe.
>>> common_names
{'Adams', 'Jones'}
>>> pd.concat((df.query('Col1 in @common_names') for df in dfs), ignore_index=True
).sort_values('Col1')
Col1 Col2 Col3
0 Adams A 10
2 Adams D 40
4 Adams F 21
1 Jones D 13
3 Jones G 70
5 Jones J 22
This solution scales nicely, as you only need to change the contents of dfs
should the number of dataframes change.
P.S. After re-reading the question, I realized you're a Pandas newbie. Welcome! Although Pandas is incredibly powerful, there is no simple solution to your question. Basically, you want to filter down your dataframes to get a list of common names. Then you want to filter each dataframe for those names, and concatenate the results.
Upvotes: 0
Reputation: 76297
You can first find the common values:
common = \
set.intersection(set(df1.Col1), set(df2.Col1), set(df3.Col1))
Then concatenate the rows whose values are within the set of common values:
pd.concat([
df1[df1.Col1.isin(common)],
df2[df2.Col1.isin(common)],
df3[df3.Col1.isin(common)]]).sort_values(by='Col1')
Upvotes: 5
Reputation: 769
One solution would be to first get the values of Col1 that exist in all of your DataFrames, then restrict each of the DataFrames to these common values and then concatenate the restricted DataFrames:
common_values = list(set(df1.Col1) & set(df2.Col1) & set(df3.Col1))
df = pd.concat([df1[df1.Col1.isin(common_values)], df2[df2.Col1.isin(common_values)], df3[df3.Col1.isin(common_values)]], ignore_index=True)
This will give you the following result:
print(df)
Col1 Col2 Col3
Adams A 10
Jones D 13
Adams D 40
Jones G 70
Adams F 21
Jones J 22
I also noticed that your desired DataFrame is sorted, if that is the case, you can sort it with the following command:
df = df.sort(['Col1', 'Col2', 'Col3'])
This will give you:
print(df)
Col1 Col2 Col3
Adams A 10
Adams D 40
Adams F 21
Jones D 13
Jones G 70
Jones J 22
Upvotes: 0