Elvis
Elvis

Reputation: 47

Pandas python error '<' not supported between instances of 'str' and 'int' while filtering column only with integers

I would like to filter only values below 10.000.000 in a column "Size" in a dataframe.

The dataframe example is below (original file is much larger):

        N       Ret  upside_tri        Size
0      77  0.000000      5.2256  58,019,065
1      77  0.000000      1.3836     969,692
2      77  0.000000      1.3543  12,792,661
3      77  0.000000      0.8839   5,721,553
4      77  0.000000      0.5477   6,984,648

In order to filter column "Size" with only values below 10.000.000, I am running the following code:

df = df[df.iloc[:, 3] < 10000000]

When I run the code to filter the dataframe with the criteria above, I keep receiving the error '<' not supported between instances of 'str' and 'int'.

Column "Size" only contains integer numbers, so it really does not make sense to me this error.

Upvotes: 2

Views: 5829

Answers (1)

Andrej Kesely
Andrej Kesely

Reputation: 195408

The column "Size" is of type str. Try to convert it to integer first:

df["Size"] = df["Size"].str.replace(",", "").astype(int)
print(df[df.iloc[:, 3] < 10000000])

Prints:

    N  Ret  upside_tri     Size
1  77  0.0      1.3836   969692
3  77  0.0      0.8839  5721553
4  77  0.0      0.5477  6984648

Or:

mask = df["Size"].str.replace(",", "").astype(int) < 10000000
print(df.loc[mask])

Prints:

    N  Ret  upside_tri       Size
1  77  0.0      1.3836    969,692
3  77  0.0      0.8839  5,721,553
4  77  0.0      0.5477  6,984,648

Upvotes: 3

Related Questions