Reputation: 297
I need to copy a column's field into a variable, based on a specific condition, and then delete it.
This dataframe contains data of some kids, that have their favourite toy and colour associated:
data = {'Kid': ['Richard', 'Daphne', 'Andy', 'May', 'Claire', 'Mozart', 'Jane'],
'Toy': ['Ball', 'Doll', 'Car', 'Barbie', 'Frog', 'Bear', 'Doll'],
'Colour': ['white', np.nan, 'red', 'pink', 'green', np.nan, np.nan]
}
df = pd.DataFrame (data, columns = ['Kid', 'Toy','Colour'])
print (df)
The dataframe looks like this:
Kid Toy Colour
0 Richard Ball white
1 Daphne Doll NaN
2 Andy Car red
3 May Barbie pink
4 Claire Frog green
5 Mozart Bear NaN
6 Jane Doll NaN
The condition is: If a kid does have a toy, but it does not have a colour, then save both the kid and the toy in a separate array as follows and maintain the order/matching:
toy_array = ["Doll", "Bear", "Doll"]
kid_array = ["Daphne", "Mozart", "Jane"]
And then delete the toy from the dataframe. So the final dataframe should look like this:
Kid Toy Colour
0 Richard Ball white
1 Daphne NaN NaN
2 Andy Car red
3 May Barbie pink
4 Claire Frog green
5 Mozart NaN NaN
6 Jane NaN NaN
I got inspired by many sources, along with this one, and I tried this:
kid_array.append(df.loc[(df['Toy'] != np.nan) & (df['Colour'] == np.nan)])
print(kid_array)
I am at the very beginning, I highly appreciate all your help if you could possibly help me!
Upvotes: 3
Views: 105
Reputation: 862511
Test missing and no misisng values by Series.isna
and
Series.notna
and then set missing values to Toy
column by DataFrame.loc
:
mask = df['Toy'].notna() & df['Colour'].isna()
df.loc[mask, 'Toy'] = np.nan
Or in Series.mask
:
df['Toy'] = df['Toy'].mask(mask)
Or by numpy.where
:
df['Toy'] = np.where(mask, np.nan, df['Toy'])
print (df)
Kid Toy Colour
0 Richard Ball white
1 Daphne NaN NaN
2 Andy Car red
3 May Barbie pink
4 Claire Frog green
5 Mozart NaN NaN
6 Jane NaN NaN
If need lists:
toy_array = df.loc[mask, 'Toy'].tolist()
kid_array = df.loc[mask, 'Kid'].tolist()
print (toy_array)
['Doll', 'Bear', 'Doll']
print (kid_array)
['Daphne', 'Mozart', 'Jane']
Upvotes: 1
Reputation: 128
Your logic is correct, just the function to compare needs to be matched with the correct function used for comparison in Numpy Library
numpy.isnan()
Try the following code
kid_array.append(df.loc[(!numpy.isnan( df['Toy'])) & (!numpy.isnan(df['Colour']))])
Upvotes: 0