Song Mei
Song Mei

Reputation: 69

how to create a 3rd dataset based on conditions from other 2 datasets in python

I have this task where I need to create a dataset based on two other connected datasets.

df = pd.DataFrame(columns=['ID','P1','P2'],
               data=[[1, 2, 0], [2,1,0], [3, 1, 2], [4, 2, 1],
                    [5, 1, 2], [6, 0, 1], [7, 1, 0]])
fp = pd.DataFrame(columns=['ID','FP'],
               data=[[1, 'fp'], [2,'i'], [3, 'i'], [4, 'fp'],
                    [5, 'fp'], [6, 'fp'], [7, 'i']])

My task is to create a third dataset which only contains the id, p1, and p2 from the df dataset if the fp data set 'FP' column shows 'fp'.

I tried this

df2 = np.where((df['ID']==fp['ID'])&fp['FP']=='fp)

But it didn't work

Upvotes: 0

Views: 380

Answers (2)

Ch3steR
Ch3steR

Reputation: 20659

You can use Series.isin with boolean indexing.

idx = fp['ID'][fp['FP'].eq('fp')]
df.loc[df['ID'].isin(idx)]

   ID  P1  P2
0   1   2   0
3   4   2   1
4   5   1   2
5   6   0   1

Upvotes: 0

N. Arunoprayoch
N. Arunoprayoch

Reputation: 942

I am sure there is a better way than mine, but this is what I would do

import pandas as pd
df = pd.DataFrame(columns=['ID','P1','P2'],
               data=[[1, 2, 0], [2,1,0], [3, 1, 2], [4, 2, 1],
                    [5, 1, 2], [6, 0, 1], [7, 1, 0]])
fp = pd.DataFrame(columns=['ID','FP'],
               data=[[1, 'fp'], [2,'i'], [3, 'i'], [4, 'fp'],
                    [5, 'fp'], [6, 'fp'], [7, 'i']])

# Merging dataframes
res = df.merge(fp)

# Filtering
res = res[res['FP'] == 'fp'].drop(columns=['FP'])
res

Result

    ID  P1  P2
0   1   2   0
3   4   2   1
4   5   1   2
5   6   0   1

Upvotes: 1

Related Questions