Reputation: 47
I would like to filter only values below 10.000.000 in a column "Size" in a dataframe.
The dataframe example is below (original file is much larger):
N Ret upside_tri Size
0 77 0.000000 5.2256 58,019,065
1 77 0.000000 1.3836 969,692
2 77 0.000000 1.3543 12,792,661
3 77 0.000000 0.8839 5,721,553
4 77 0.000000 0.5477 6,984,648
In order to filter column "Size" with only values below 10.000.000, I am running the following code:
df = df[df.iloc[:, 3] < 10000000]
When I run the code to filter the dataframe with the criteria above, I keep receiving the error '<' not supported between instances of 'str' and 'int'.
Column "Size" only contains integer numbers, so it really does not make sense to me this error.
Upvotes: 2
Views: 5829
Reputation: 195408
The column "Size" is of type str
. Try to convert it to integer first:
df["Size"] = df["Size"].str.replace(",", "").astype(int)
print(df[df.iloc[:, 3] < 10000000])
Prints:
N Ret upside_tri Size
1 77 0.0 1.3836 969692
3 77 0.0 0.8839 5721553
4 77 0.0 0.5477 6984648
Or:
mask = df["Size"].str.replace(",", "").astype(int) < 10000000
print(df.loc[mask])
Prints:
N Ret upside_tri Size
1 77 0.0 1.3836 969,692
3 77 0.0 0.8839 5,721,553
4 77 0.0 0.5477 6,984,648
Upvotes: 3