gaphalpa
gaphalpa

Reputation: 25

How do I update the values of a column in a data frame based on the values of another data frame in Python?

df1
  | location | latitude
0 | NaN      | 41.0
1 | NaN      | 43.0
2 | NaN      | 72.0
3 | NaN      | 74.0

df2
  | location | min_latitude | max_latitude
0 | point_a  | 40.0         | 45.0
1 | point_b  | 70.0         | 75.0

How can I update the location values in df1 based on the condition that latitude is between min_latitude and max_latitude just like the result below?

df1
  | location | latitude
0 | point_a  | 41.0
1 | point_a  | 43.0
2 | point_b  | 72.0
3 | point_b  | 74.0

I have tried using df1['location'].apply(lambda x: df2['location'] if df1['location'].between(df2['min_latitude'], df2['max_latitude']) else df1['location']), but it returns ValueError: Can only compare identically-labeled Series objects.

Upvotes: 0

Views: 43

Answers (1)

rwalroth
rwalroth

Reputation: 340

You don't need to use between, you can use boolean indexing and get what you want with:

for idx in df2.index:
    df1.loc[(df1["latitude"] > df2["min_latitude"][idx]) & 
            (df1["latitude"] < df2["max_latitude"][idx]), "location"] = df2["location"][idx]

It's a complicated looking one liner, but it's just a couple of comparisons to get the rows you want. Using loc here is important, otherwise pandas won't assign anything due to chaining

Upvotes: 1

Related Questions