dgLurn
dgLurn

Reputation: 51

Can't Do Math on Column If Some Rows Are of Type String

Here is a sample of my df:

    units                      price
0   143280.0                   0.8567                        
1   4654.0                    464.912                       
2   512210.0                   607  
3  Unknown                    0                          
4  Unknown                    0 

I have the following code:

myDf.loc[(myDf["units"].str.isnumeric())&(myDf["price"].str.isnumeric()),'newValue']=(    
myDf["price"].astype(float).fillna(0.0)*
myDf["units"].astype(float).fillna(0.0)/
1000)

As you can see, I'm trying to only do math to create the 'newValue' column for rows where the two source columns are both numeric. However, I get the following error:

ValueError: could not convert string to float: 'Unknown'

So it seems that even though I'm attempting to perform math only on the rows that don't have text, Pandas does not like that any of the rows have text.

Note that I need to maintain the instances of "Unknown" exactly as they are and so filling those with zero is not a good option.

This has be pretty stumped. Could not find any solutions by searching Google.

Would appreciate any help/solutions.

Upvotes: 0

Views: 176

Answers (1)

dataista
dataista

Reputation: 3457

You can use the same condition you use on the left side of the = on the right side as follows (I set the condition in a variable is_num for readability):

is_num = (myDf["units"].astype(str).str.replace('.', '').str.isnumeric()) & (myDf["price"].astype(str).str.replace('.', '').str.isnumeric())
myDf.loc[is_num,'newValue']=( 
       myDf.loc[is_num, "price"].astype(float).fillna(0.0)* 
       myDf.loc[is_num, "units"].astype(float).fillna(0.0)/1000)

Also, you need to check with your read dataframe, but from this example, you can:

  1. Remove the fillna(0.0), since there are no NaNs
  2. Remove the checks on 'price' (as of your example, price is always numeric, so the check is not necessary)
  3. Remove the astype(float) cast for price, since it's already numeric.

That would lead to the following somewhat more concise code:

is_num = myDf["units"].astype(str).str.replace('.', '').str.isnumeric()
myDf.loc[is_num,'newValue']=( 
       myDf.loc[is_num, "price"].astype(float)* 
       myDf.loc[is_num, "units"]/1000)

Upvotes: 1

Related Questions