kev
kev

Reputation: 2881

Check whether column values are within range

Here's what I have in my dataframe-

RecordType    Latitude    Longitude    Name
  L             28.2N        70W       Jon
  L             34.3N        56W       Dan
  L             54.2N        72W       Rachel

Note: The dtype of all the columns is object.

Now, in my final dataframe, I only want to include those rows in which the Latitude and Longitude fall in a certain range (say 24 < Latitude < 30 and 79 < Longitude < 87).

My idea is to apply a function to all the values in the Latitude and Longitude columns to first get float values like 28.2, etc. and then to compare the values to see if they fall into my range. So I wrote the following-

def numbers(value):
    return float(value[:-1])

result[u'Latitude'] = result[u'Latitude'].apply(numbers)
result[u'Longitude'] = result[u'Longitude'].apply(numbers)

But I get the following warning-

Warning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy

I'm having a hard time understanding this since I'm new to Pandas. What's the best way to do this?

Upvotes: 2

Views: 1136

Answers (2)

James Dellinger
James Dellinger

Reputation: 1261

As for why Pandas threw that particular A value is trying to be set on a copy of a slice... warning and how to avoid it:

First, using this syntax should prevent the error message:

result.loc[:,'Latitude'] = result['Latitude'].apply(numbers)

Pandas gave you the warning because your .apply() function may be attempting to modify a temporary copy of Latitude/Longitude columns in your dataframe. Meaning, the column is copied to a new location in memory before the operation is performed on it. The article you referenced (http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy) gives examples of why this could potentially cause unexpected problems in certain situations.

Pandas instead recommends that you instead use syntax that will ensure you are modifying a view of your dataframe's column with the .apply() operation. Doing this will ensure that your dataframe ends up being modified in the manner you expect. The code I wrote above using .loc will tell Pandas to access and modify the contents of that column in-place in memory, and this will keep Pandas from throwing the warning that you saw.

Upvotes: 2

cs95
cs95

Reputation: 402263

If you don't want to modify df, I would suggest getting rid of the apply and vectorising this. One option is using eval.

u = df.assign(Latitude=df['Latitude'].str[:-1].astype(float))
u['Longitude'] = df['Longitude'].str[:-1].astype(float)

df[u.eval("24 < Latitude < 30 and 79 < Longitude < 87")]

You have more options using Series.between:

u = df['Latitude'].str[:-1].astype(float))
v = df['Longitude'].str[:-1].astype(float))

df[u.between(24, 30, inclusive=False) & v.between(79, 87, inclusive=False)]

Upvotes: 3

Related Questions