Reputation: 180
What is the difference between these two lines of code?
print(df.drop(df.where(df['Quantity']==0).index).rename(columns={'Weight':'Weight(oz)'}))
and
print(df.drop(df[df['Quantity'] == 0].index).rename(columns={'Weight': 'Weight (oz.)'}))
In other words, what is the difference between
df.where(df['Quantity']==0).index
with the following output
And
df[df['Quantity'] == 0].index
with the following output
Upvotes: 2
Views: 108
Reputation: 863611
It is difference because it uses DataFrame.where
:
df.where(df['Quantity']==0).index
it only replace non matched rows to NaN
s, so the index of the result is the same as the original df
.
But if you use:
df[df['Quantity'] == 0].index
it is called boolean indexing
and it filters the DataFrame
by condition, so index values are different from the original df
.
Sample:
df = pd.DataFrame({'Quantity':[0,1,2,1,1,0],
'Weight': [4,5,6,7,7,8]},
index=list('abcdef'))
print (df)
Quantity Weight
a 0 4
b 1 5
c 2 6
d 1 7
e 1 7
f 0 8
#removed all index values - empty DataFrame
print(df.drop(df.where(df['Quantity']==0).index).rename(columns={'Weight':'Weight(oz)'}))
Empty DataFrame
Columns: [Quantity, Weight(oz)]
Index: []
print (df.where(df['Quantity']==0).index)
Index(['a', 'b', 'c', 'd', 'e', 'f'], dtype='object')
print (df.where(df['Quantity']==0))
Quantity Weight
a 0.0 4.0
b NaN NaN
c NaN NaN
d NaN NaN
e NaN NaN
f 0.0 8.0
#removed rows with 0 in Quantity
print(df.drop(df[df['Quantity'] == 0].index).rename(columns={'Weight': 'Weight (oz.)'}))
Quantity Weight (oz.)
b 1 5
c 2 6
d 1 7
e 1 7
print (df[df['Quantity'] == 0].index)
Index(['a', 'f'], dtype='object')
print (df[df['Quantity'] == 0])
Quantity Weight
a 0 4
f 0 8
In my opinion solution with drop
is overcomplicated, simplier is use inverse logic - select all rows with no 0
in Quantity
column:
print(df[df['Quantity'] != 0].rename(columns={'Weight': 'Weight (oz.)'}))
Quantity Weight (oz.)
b 1 5
c 2 6
d 1 7
e 1 7
Upvotes: 3