What is the difference between the code below?

Question

What is the difference between these two lines of code?

print(df.drop(df.where(df['Quantity']==0).index).rename(columns={'Weight':'Weight(oz)'}))

and

print(df.drop(df[df['Quantity'] == 0].index).rename(columns={'Weight': 'Weight (oz.)'}))

In other words, what is the difference between

df.where(df['Quantity']==0).index

with the following output

And

df[df['Quantity'] == 0].index

with the following output

jezrael · Accepted Answer

It is difference because it uses DataFrame.where:

df.where(df['Quantity']==0).index

it only replace non matched rows to NaNs, so the index of the result is the same as the original df.

But if you use:

df[df['Quantity'] == 0].index

it is called boolean indexing and it filters the DataFrame by condition, so index values are different from the original df.

Sample:

df = pd.DataFrame({'Quantity':[0,1,2,1,1,0],
                   'Weight':  [4,5,6,7,7,8]},
                    index=list('abcdef'))
print (df)
   Quantity  Weight
a         0       4
b         1       5
c         2       6
d         1       7
e         1       7
f         0       8

#removed all index values - empty DataFrame
print(df.drop(df.where(df['Quantity']==0).index).rename(columns={'Weight':'Weight(oz)'}))  
Empty DataFrame
Columns: [Quantity, Weight(oz)]
Index: []

print (df.where(df['Quantity']==0).index)
Index(['a', 'b', 'c', 'd', 'e', 'f'], dtype='object')

print (df.where(df['Quantity']==0))
   Quantity  Weight
a       0.0     4.0
b       NaN     NaN
c       NaN     NaN
d       NaN     NaN
e       NaN     NaN
f       0.0     8.0

#removed rows with 0 in Quantity
print(df.drop(df[df['Quantity'] == 0].index).rename(columns={'Weight': 'Weight (oz.)'}))
   Quantity  Weight (oz.)
b         1             5
c         2             6
d         1             7
e         1             7

print (df[df['Quantity'] == 0].index)
Index(['a', 'f'], dtype='object')

print (df[df['Quantity'] == 0])
   Quantity  Weight
a         0       4
f         0       8

In my opinion solution with drop is overcomplicated, simplier is use inverse logic - select all rows with no 0 in Quantity column:

print(df[df['Quantity'] != 0].rename(columns={'Weight': 'Weight (oz.)'}))
   Quantity  Weight (oz.)
b         1             5
c         2             6
d         1             7
e         1             7

What is the difference between the code below?

Answers (1)

Related Questions