A99
A99

Reputation: 51

Creating new column based on other column values with condition

I have a column with values:

brand
Brand1
Brand2
Brand3
data.brand = data.brand.astype(str)
data.brand = data.brand.replace(r'^\s*$', np.nan, regex=True)
data['branded'] = np.where(data['brand']!= 'nan', True, False)

after first init of the code I get results:

brand branded
Brand1 TRUE
Brand2 TRUE
nan TRUE
Brand3 TRUE

after second init of the same code I get desired results:

brand branded
Brand1 TRUE
Brand2 TRUE
nan FALSE
Brand3 TRUE

What could be the smarter way to face/avoid this problem?

Upvotes: 0

Views: 54

Answers (1)

anky
anky

Reputation: 75140

This answer just focusses on Why did the first iteration not work

In your code when you replace the data.brand with the regex, you replace with np.nan which is not nan, hence the first init cannot identify the condition in the next line : np.where(data['brand']!= 'nan', True, False). However, on the second init, the row is already a np.nan and you do .astype(str) in the first line which sets np.nan to 'nan' and hence the third line works.

Solution:

Replace:

data.brand = data.brand.replace(r'^\s*$', np.nan, regex=True)

With:

data.brand = data.brand.replace(r'^\s*$', 'nan', regex=True)

This will set the replace value to 'nan' from the get go and hence the third line will run fine in the first iteration.

Upvotes: 2

Related Questions