Reputation: 51
I have a column with values:
brand |
---|
Brand1 |
Brand2 |
Brand3 |
data.brand = data.brand.astype(str)
data.brand = data.brand.replace(r'^\s*$', np.nan, regex=True)
data['branded'] = np.where(data['brand']!= 'nan', True, False)
after first init of the code I get results:
brand | branded |
---|---|
Brand1 | TRUE |
Brand2 | TRUE |
nan | TRUE |
Brand3 | TRUE |
after second init of the same code I get desired results:
brand | branded |
---|---|
Brand1 | TRUE |
Brand2 | TRUE |
nan | FALSE |
Brand3 | TRUE |
What could be the smarter way to face/avoid this problem?
Upvotes: 0
Views: 54
Reputation: 75140
This answer just focusses on Why did the first iteration not work
In your code when you replace the data.brand
with the regex
, you replace with np.nan
which is not nan
, hence the first init cannot identify the condition in the next line : np.where(data['brand']!= 'nan', True, False)
. However, on the second init, the row is already a np.nan
and you do .astype(str)
in the first line which sets np.nan
to 'nan'
and hence the third line works.
Solution:
Replace:
data.brand = data.brand.replace(r'^\s*$', np.nan, regex=True)
With:
data.brand = data.brand.replace(r'^\s*$', 'nan', regex=True)
This will set the replace value to 'nan'
from the get go and hence the third line will run fine in the first iteration.
Upvotes: 2