Reputation: 3811
I have a df where I need to select the rows only where the Condition1 column has value 11 and Score is positive
Condition1 Score
11 100
12 100
11 -2
11 200
11 11
11 -300
10 200
Expected output
Condition1 Score
11 100
11 200
11 11
Code:
df.loc[df.Condition1.eq(11) & (np.sign(df.score) >= 0)]
Error obtained
TypeError: '<' not supported between instances of 'str' and 'int'
I cehecked the dtype of Score and Condition1 . Condition1 is int and score is object, can that be the problem?
Upvotes: 0
Views: 502
Reputation: 31011
Probably your df contains in Score column a text representation of numbers, something like you created it running:
df = pd.DataFrame({'Condition1': [11, 12, 11, 11, 11, 11, 10],
'Score': ['100', '100 ', '-2 ', '200', ' 11', '-300', ' 200']})
Note that:
When you run df.info()
the result is something like:
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Condition1 7 non-null int64
1 Score 7 non-null object
but what you see on the screen only looks like numbers.
And this is just the reason why plain astype(int) fails: The above strings (with either leading or trailing spaces) are not convertible to int.
To cope with your problem:
The code to do it is:
df.Score = df.Score.str.replace(' ', '').astype(int)
Now when you run df.info()
the result should be something like:
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Condition1 7 non-null int64
1 Score 7 non-null int32
And now, as Score column is a number, when you run:
df.loc[df.Condition1.eq(11) & (np.sign(df.Score) >= 0)]
you should get the expected result.
Note: Your column is named Score (with upper case "S"), but in your code sample you wrote score (with lower case "s"). Remember to correct this detail.
You can simplify your instruction, changing it to:
df.loc[df.Condition1.eq(11) & (df.Score >= 0)]
Upvotes: 1