Reputation: 987
Let's say I have the two following pandas.DataFrame
, df0
and df1
:
import pandas as pd
# 1st data set
dat0 = [['A0', 'B0', 'C0', 'case0', 1],
['A0', 'B2', 'C0', 'case0', 2],
['A1', 'B0', 'C0', 'case0', 3],
['A1', 'B1', 'C0', 'case0', 4],
['A0', 'B0', 'C1', 'case0', 5]]
df0 = pd.DataFrame(dat0, columns=['colA', 'colB', 'colC', 'colCase', 'colVal'])
# 2nd data set
dat1 = [['A0', 'B1', 'C2', 'case1', 6],
['A0', 'B2', 'C2', 'case1', 7],
['A1', 'B0', 'C2', 'case1', 8],
['A2', 'B2', 'C2', 'case1', 9]]
df1 = pd.DataFrame(dat1, columns=['colA', 'colB', 'colC', 'colCase', 'colVal'])
I am trying to get all the lines that, for each couple of columns (A, B) have a value for both case0 and case1 (ie a value in each DF).
In case it matters, I cannot have "duplicate" lines in the DataFrames: the set (A, B, C, case) is unique in each DF.
So I would like to reach a code that would look like:
# Merge the DataFrames
df = pd.concat([df0, df1]) # maybe concat is not a good starting point
for a in ['A0', 'A1', 'A2']:
for b in ['B0', 'B1', 'B2']:
table = my_great_function(df, a, b)
if table:
print '---'
print table
And get the following result:
---
colA colB colC colCase colVal
1 A0 B2 C0 case0 2
1 A0 B2 C2 case1 7
---
colA colB colC colCase colVal
2 A1 B0 C0 case0 3
2 A1 B0 C2 case1 8
Note I don't want to get results where only one line appears, nor where two or more lines appear with the same case.
Any suggestion?
Upvotes: 2
Views: 77
Reputation: 214957
You can do something similar to this. Modify the condition accordingly if that is not exactly what you want (here each g represents a sub data frame with a unique combination of colA and colB):
for _, g in df.groupby(["colA", "colB"]):
case_lst = g.colCase.tolist()
if 'case0' in case_lst and 'case1' in case_lst:
print("--------")
print(g)
--------
colA colB colC colCase colVal
1 A0 B2 C0 case0 2
1 A0 B2 C2 case1 7
--------
colA colB colC colCase colVal
2 A1 B0 C0 case0 3
2 A1 B0 C2 case1 8
Upvotes: 3