Reputation: 69
I have this task where I need to create a dataset based on two other connected datasets.
df = pd.DataFrame(columns=['ID','P1','P2'],
data=[[1, 2, 0], [2,1,0], [3, 1, 2], [4, 2, 1],
[5, 1, 2], [6, 0, 1], [7, 1, 0]])
fp = pd.DataFrame(columns=['ID','FP'],
data=[[1, 'fp'], [2,'i'], [3, 'i'], [4, 'fp'],
[5, 'fp'], [6, 'fp'], [7, 'i']])
My task is to create a third dataset which only contains the id, p1, and p2 from the df dataset if the fp data set 'FP' column shows 'fp'.
I tried this
df2 = np.where((df['ID']==fp['ID'])&fp['FP']=='fp)
But it didn't work
Upvotes: 0
Views: 380
Reputation: 20659
You can use Series.isin
with boolean indexing
.
idx = fp['ID'][fp['FP'].eq('fp')]
df.loc[df['ID'].isin(idx)]
ID P1 P2
0 1 2 0
3 4 2 1
4 5 1 2
5 6 0 1
Upvotes: 0
Reputation: 942
I am sure there is a better way than mine, but this is what I would do
import pandas as pd
df = pd.DataFrame(columns=['ID','P1','P2'],
data=[[1, 2, 0], [2,1,0], [3, 1, 2], [4, 2, 1],
[5, 1, 2], [6, 0, 1], [7, 1, 0]])
fp = pd.DataFrame(columns=['ID','FP'],
data=[[1, 'fp'], [2,'i'], [3, 'i'], [4, 'fp'],
[5, 'fp'], [6, 'fp'], [7, 'i']])
# Merging dataframes
res = df.merge(fp)
# Filtering
res = res[res['FP'] == 'fp'].drop(columns=['FP'])
res
Result
ID P1 P2
0 1 2 0
3 4 2 1
4 5 1 2
5 6 0 1
Upvotes: 1