Reputation: 15
Here is the DataFrame when I am doing Python using pandas.
import pandas as pd
import numpy as np
items = ['Apple','Cherry','Banana','Apple','Banana']
place = ['UK','','China','USA','China']
quality = ['Bad','Bad','Good','Bad','Good']
date = ['Jan-1','Jan-2','Jan-3','','Jan-1']
df = pd.DataFrame({"Item":items, "Place":place, "Quality":quality, "Date":date})
print(df)
Item | Place | Quality | Date |
---|---|---|---|
Apple | UK | Bad | Jan-1 |
Cherry | Bad | Jan-2 | |
Banana | China | Good | Jan-3 |
Apple | USA | Bad | |
Banana | Chian | Good | Jan-1 |
What I require is:
Place
is null and column Date
is not null, then transfer the value from Date
to Place
;Place
is not null and column Date
is null, then no transferring happens;Place
is and column Date
is are both null or not null, then no transferring happens, either.So I tried:
df.Place=np.where(df.Date!="",df.Date,df.Place)
but this can not be satisfied with what I require. The output should be like this:
Item | Place | Quality | Date |
---|---|---|---|
Apple | UK | Bad | Jan-1 |
Cherry | Jan-2 | Bad | Jan-2 |
Banana | China | Good | Jan-3 |
Apple | USA | Bad | |
Banana | Chian | Good | Jan-1 |
Can someone help me with this solution? Thanks
Upvotes: 1
Views: 849
Reputation:
The conditions of np.where
seems to be incomplete:
df = df.replace('',np.nan)
df['Place'] = np.where(df['Place'].isna() & df['Date'].notna(), df['Date'], df['Place'])
or without replacing ''
with NaN
, just write the complete condition:
df.Place = np.where((df.Date!="") & (df.Place==""), df.Date, df.Place)
As @mozway explains, note that df.Date!=""
is redundant because if it's False, then we select from df.Place
(regardless if it's ""
or not) and if it's True, it depends on if df.Place==""
. So what really decides the selection is df.Place==""
. Then we can write the above simply:
df.Place = np.where(df.Place=="", df.Date, df.Place)
Output:
Item Place Quality Date
0 Apple UK Bad Jan-1
1 Cherry Jan-2 Bad Jan-2
2 Banana China Good Jan-3
3 Apple USA Bad NaN
4 Banana China Good Jan-1
Upvotes: 3
Reputation: 260780
You can replace the empty strings with NaN and fillna
with your other column:
df['Place'] = df['Place'].replace('', float('nan')).fillna(df['Date'])
The correct code with numpy.where
would have been:
df['Place'] = np.where(df.Place!='', df.Place, df.Date)
output:
Item Place Quality Date
0 Apple UK Bad Jan-1
1 Cherry Jan-2 Bad Jan-2
2 Banana China Good Jan-3
3 Apple USA Bad
4 Banana China Good Jan-1
Upvotes: 2