Reputation: 489
I need to update the column value based on these conditions
i. if score > 3, set score to 1.
ii. if score <= 2, set score to 0.
iii. if score == 3, drop that row.
Score has the values between 1 to 5
I have written the following code, but all the values is being changed to 0.
reviews.loc[reviews['Score'] > 3, 'Score'] = 1
reviews.loc[reviews['Score'] <= 2, 'Score'] = 0
reviews.drop(reviews[reviews['Score'] == 3].index, inplace = True)
Please point out the mistake am doing in this.
Upvotes: 3
Views: 6102
Reputation: 862406
There is logic problem:
reviews = pd.DataFrame({'Score':range(6)})
print (reviews)
Score
0 0
1 1
2 2
3 3
4 4
5 5
If set all values higher like 3
to 1
it working like need:
reviews.loc[reviews['Score'] > 3, 'Score'] = 1
print (reviews)
Score
0 0
1 1
2 2
3 3
4 1
5 1
Then all vallues without 3
are set to 0
, so also are replaced 1
from reviews['Score'] > 3
:
reviews.loc[reviews['Score'] <= 2, 'Score'] = 0
print (reviews)
Score
0 0
1 0
2 0
3 3
4 0
5 0
Last are removed 3
rows and get only 0
values:
reviews.drop(reviews[reviews['Score'] == 3].index, inplace = True)
print (reviews)
Score
0 0
1 0
2 0
4 0
5 0
You can change solution:
reviews = pd.DataFrame({'Score':range(6)})
print (reviews)
Score
0 0
1 1
2 2
3 3
4 4
5 5
First removed 3
by filter all rows not equal to 3
in boolean indexing
:
reviews = reviews[reviews['Score'] != 3].copy()
And then are set values to 0
and 1
:
reviews['Score'] = (reviews['Score'] > 3).astype(int)
#alternative
reviews['Score'] = np.where(reviews['Score'] > 3, 1, 0)
print (reviews)
Score
0 0
1 0
2 0
4 1
5 1
EDIT1:
Your solution should be changed with swap lines - first set 0
and then 1
for avoid overwrite values:
reviews.loc[reviews['Score'] <= 2, 'Score'] = 0
reviews.loc[reviews['Score'] > 3, 'Score'] = 1
reviews.drop(reviews[reviews['Score'] == 3].index, inplace = True)
print (reviews)
Score
0 0
1 0
2 0
4 1
5 1
Upvotes: 3