Reputation: 2881
Here's what I have in my dataframe-
RecordType Latitude Longitude Name
L 28.2N 70W Jon
L 34.3N 56W Dan
L 54.2N 72W Rachel
Note: The dtype
of all the columns is object
.
Now, in my final dataframe, I only want to include those rows in which the Latitude and Longitude fall in a certain range (say 24 < Latitude < 30
and 79 < Longitude < 87
).
My idea is to apply
a function to all the values in the Latitude
and Longitude
columns to first get float
values like 28.2
, etc. and then to compare the values to see if they fall into my range. So I wrote the following-
def numbers(value):
return float(value[:-1])
result[u'Latitude'] = result[u'Latitude'].apply(numbers)
result[u'Longitude'] = result[u'Longitude'].apply(numbers)
But I get the following warning-
Warning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
I'm having a hard time understanding this since I'm new to Pandas. What's the best way to do this?
Upvotes: 2
Views: 1136
Reputation: 1261
As for why Pandas threw that particular A value is trying to be set on a copy of a slice...
warning and how to avoid it:
First, using this syntax should prevent the error message:
result.loc[:,'Latitude'] = result['Latitude'].apply(numbers)
Pandas gave you the warning because your .apply()
function may be attempting to modify a temporary copy of Latitude
/Longitude
columns in your dataframe. Meaning, the column is copied to a new location in memory before the operation is performed on it. The article you referenced (http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy) gives examples of why this could potentially cause unexpected problems in certain situations.
Pandas instead recommends that you instead use syntax that will ensure you are modifying a view of your dataframe's column with the .apply()
operation. Doing this will ensure that your dataframe ends up being modified in the manner you expect. The code I wrote above using .loc
will tell Pandas to access and modify the contents of that column in-place in memory, and this will keep Pandas from throwing the warning that you saw.
Upvotes: 2
Reputation: 402263
If you don't want to modify df
, I would suggest getting rid of the apply
and vectorising this. One option is using eval
.
u = df.assign(Latitude=df['Latitude'].str[:-1].astype(float))
u['Longitude'] = df['Longitude'].str[:-1].astype(float)
df[u.eval("24 < Latitude < 30 and 79 < Longitude < 87")]
You have more options using Series.between
:
u = df['Latitude'].str[:-1].astype(float))
v = df['Longitude'].str[:-1].astype(float))
df[u.between(24, 30, inclusive=False) & v.between(79, 87, inclusive=False)]
Upvotes: 3