Reputation: 20302
I am trying to filter a data frame based on two conditions.
Here is the code that I tried, and I got errors, or I wouldn't be posting here.
df.loc[(df['Count']>=30) & (df['LAND SQUARE FEET']< 5000)['Count','LAND SQUARE FEET']]
df[df.eval("Count>=30 & (LAND SQUARE FEET <5000).values")]
How can I get this to work?
Upvotes: 0
Views: 73
Reputation: 16683
Specifically, the error(see comments of other answer) is now saying there is a -
in one of the values in your column, You can use my solution below OR you can do df["LAND SQUARE FEET"] = df["LAND SQUARE FEET"] =.replace('-','').astype(int)
However, there may be other strings that you need to replace, that might mean you keep seeing errors if there are more strings, other than -
, for example ,
or ft.
. Also, that line with -
might be bad data altogether as I'm not sure why a -
would be in a number that is supposed to be an integer of square feet.
Also, you can look at that line specifically with df[df["LAND SQUARE FEET"].str.contains('-')
and from there decide what you want to do with it -- either manipulate it's data with replace
or make it NaN
with pd.to_numeric()
Solution with pd.to_numeric()
:
You need to use pd.to_numeric()
first as you have strings in your column and passing errors='coerce'
changes to NaN
for those values that are strings. The data type of the column should now be a float
if you call df.info()
:
Step 1:
df = pd.DataFrame({"LAND SQUARE FEET" : [4500, '4,400 feet', '4,600', 4700, 5500, 6000],
"Count" : [45,55,65,75,15,25]})
df
Out[1]:
LAND SQUARE FEET Count
0 4500 45
1 4,400 feet 55
2 4,600 65
3 4700 75
4 5500 15
5 6000 25
Step 2:
df = pd.DataFrame({"LAND SQUARE FEET" : [4500, '4,400 feet', '4,600', 4700, 5500, 6000],
"Count" : [45,55,65,75,15,25]})
df["LAND SQUARE FEET"] = pd.to_numeric(df["LAND SQUARE FEET"], errors='coerce')
df
Out[2]:
LAND SQUARE FEET Count
0 4500.0 45
1 NaN 55
2 NaN 65
3 4700.0 75
4 5500.0 15
5 6000.0 25
Step 3 (and final output):
df = pd.DataFrame({"LAND SQUARE FEET" : [4500, '4,400 feet', '4,600', 4700, 5500, 6000],
"Count" : [45,55,65,75,15,25]})
df["LAND SQUARE FEET"] = pd.to_numeric(df["LAND SQUARE FEET"], errors='coerce')
new_df = df.loc[(df['Count']>=30) & (df['LAND SQUARE FEET']< 5000),['Count','LAND SQUARE FEET']]
new_df
Out[3]:
Count LAND SQUARE FEET
0 45 4500.0
3 75 4700.0
Upvotes: 1