noob
noob

Reputation: 3811

TypeError: '<' not supported between instances of 'str' and 'int' Fetch only rows which meet two conditions

I have a df where I need to select the rows only where the Condition1 column has value 11 and Score is positive

Condition1   Score
11            100
12            100            
11            -2  
11            200
11             11
11            -300
10             200

 Expected output

Condition1   Score
11            100
11            200
11             11

Code:

 df.loc[df.Condition1.eq(11) & (np.sign(df.score) >= 0)]

Error obtained

TypeError: '<' not supported between instances of 'str' and 'int'

I cehecked the dtype of Score and Condition1 . Condition1 is int and score is object, can that be the problem?

Upvotes: 0

Views: 502

Answers (1)

Valdi_Bo
Valdi_Bo

Reputation: 31011

Probably your df contains in Score column a text representation of numbers, something like you created it running:

df = pd.DataFrame({'Condition1': [11, 12, 11, 11, 11, 11, 10],
    'Score': ['100', '100  ', '-2 ', '200', ' 11', '-300', ' 200']})

Note that:

  • the second element contains trailing spaces,
  • the 6-th and 8-th element contain a leading space.

When you run df.info() the result is something like:

 #   Column      Non-Null Count  Dtype 
---  ------      --------------  ----- 
 0   Condition1  7 non-null      int64 
 1   Score       7 non-null      object

but what you see on the screen only looks like numbers.

And this is just the reason why plain astype(int) fails: The above strings (with either leading or trailing spaces) are not convertible to int.

To cope with your problem:

  • first drop these spaces,
  • then convert this column to int.

The code to do it is:

df.Score = df.Score.str.replace(' ', '').astype(int)

Now when you run df.info() the result should be something like:

 #   Column      Non-Null Count  Dtype
---  ------      --------------  -----
 0   Condition1  7 non-null      int64
 1   Score       7 non-null      int32

And now, as Score column is a number, when you run:

df.loc[df.Condition1.eq(11) & (np.sign(df.Score) >= 0)]

you should get the expected result.

Note: Your column is named Score (with upper case "S"), but in your code sample you wrote score (with lower case "s"). Remember to correct this detail.

Edit

You can simplify your instruction, changing it to:

df.loc[df.Condition1.eq(11) & (df.Score >= 0)]

Upvotes: 1

Related Questions