Marina
Marina

Reputation: 349

Python DataFrame - Select dataframe rows based on values in another dataframe

I'm struggling with a dataframe related problem. There are two dataframes, df and dff, as below

data = np.array([['', 'col1', 'col2'],
            ['row1', 1, 2],
            ['row2', 3, 4]])
df = pd.DataFrame(data=data[1:,1:].astype(int), index=data[1:,0],columns=data[0,1:])


filters=np.array([['', 'col1', 'col2'],
                 ['row1', 1, 1],
                 ['row2', 1, 2],
                 ['row3', 3, 2]])
dff = pd.DataFrame(data=filters[1:,1:].astype(int), index=filters[1:,0],columns=filters[0,1:])

I wish to select rows from df such that their col2 value belongs to a list of values that can be found in dff with matching col1 value. For example, for the col1 value equals to 1, that list should be [1, 2], for the col1 value equals 2, the list is [2].

My best attempt to solve this is

df1 = df[df['col2'].isin(dff[dff['col1']==df['col1']]['col2'])]

But that results in

ValueError: Can only compare identically-labeled Series objects

Any help would be appreciated. Thanks so much.

Upvotes: 1

Views: 1168

Answers (1)

rafaelc
rafaelc

Reputation: 59304

As far as I understand, you can simply aggregate

ndf = dff.groupby('col1').agg(lambda x: list(x)).reset_index()

    col1   col2
0   1      [1, 2]
1   3      [2]

and filter whichever values of col1 that are not in df

ndf[ndf.col1.isin(df.col1)]

Upvotes: 1

Related Questions