Creating new column based on other column values with condition

Question

I have a column with values:

brand
Brand1
Brand2

Brand3

data.brand = data.brand.astype(str)
data.brand = data.brand.replace(r'^\s*$', np.nan, regex=True)
data['branded'] = np.where(data['brand']!= 'nan', True, False)

after first init of the code I get results:

brand	branded
Brand1	TRUE
Brand2	TRUE
nan	TRUE
Brand3	TRUE

after second init of the same code I get desired results:

brand	branded
Brand1	TRUE
Brand2	TRUE
nan	FALSE
Brand3	TRUE

What could be the smarter way to face/avoid this problem?

anky · Accepted Answer

This answer just focusses on Why did the first iteration not work

In your code when you replace the data.brand with the regex, you replace with np.nan which is not nan, hence the first init cannot identify the condition in the next line : np.where(data['brand']!= 'nan', True, False). However, on the second init, the row is already a np.nan and you do .astype(str) in the first line which sets np.nan to 'nan' and hence the third line works.

Solution:

Replace:

data.brand = data.brand.replace(r'^\s*$', np.nan, regex=True)

With:

data.brand = data.brand.replace(r'^\s*$', 'nan', regex=True)

This will set the replace value to 'nan' from the get go and hence the third line will run fine in the first iteration.

Creating new column based on other column values with condition

Answers (1)

Related Questions