Oscar Agbor
Oscar Agbor

Reputation: 71

How do I conditionally remove columns from my dataframe without using for loops?

I am trying to filter an AppleStore.csv dataframe based on price. I want to create a 'new' dataframe with the condition that only free apps are included. Below is the code I used to filter the same condition on a googleplaystore.csv app dataframe and it worked fine.

import numpy as np


df_A = pd.read_csv("AppleStore.csv") 
df_G = pd.read_csv('googleplaystore.csv')

df_G.dropna(axis = 0, how = "any", inplace = True)

df_gg = df_G[df_G.Price == '0'] # df_gg is the new google apps df with only free apps

df_apple = df_A[df_A.price == '0.0'] 

when I run the code above, it returns only the column header row and the error message:

C:\Users\Dan\anaconda3\lib\site-packages\pandas\core\ops\array_ops.py:253: FutureWarning: elementwise comparison failed; returning scalar instead, but in the future will perform elementwise comparison
  res_values = method(rvalues)

I am quite uncertain as to what to do. Any and all help is appreciated.

Upvotes: 0

Views: 49

Answers (2)

Umar.H
Umar.H

Reputation: 23099

This is because you're comparing a string to a integer or float column:

df1 = pd.DataFrame({'price' : [0,1]})

df1[df1.price == '0']

 FutureWarning: elementwise comparison failed; returning scalar instead, but in the future will perform elementwise comparison
  res_values = method(rvalues)

Where as :

df1[df1.price == 0]


 price
0   0

Upvotes: 1

jezrael
jezrael

Reputation: 862481

Problem is there are numeric columns or mixed numeric with strings. So try compare by 0 instead string '0', '0.0' if all columns are numeric:

df_gg = df_G[df_G.Price == 0] 
df_apple = df_A[df_A.price == 0] 

If mixed types because replace missing values to 0 numeric try convert to numeric columns:

df_G.Price = df_G.Price.astype(float)
df_A.Price = df_A.Price.astype(float)

And then compare:

df_gg = df_G[df_G.Price == 0] 
df_apple = df_A[df_A.price == 0] 

Upvotes: 2

Related Questions