Niveta
Niveta

Reputation: 13

How do I compare two columns at once against two different data frames in python (pandas)?

df1 contains two columns of Lat and Long, and several thousand rows. df2 also contains two columns of lat and long with many rows. Essentially, df2 is a list of reference locations that I want to compare df1 with. I want to compare both the Latitude and Longitude of df1 with df2 to say their locations match, or say they don't. i.e.,

my_data = pd.read_csv('/path/to/file', usecols = ['Lat','Lon'])
reference_data = pd.read_csv('/path/to/file', usecols = ['Lat','Lon'])

In simpler words, I want to say that if the location in each row in my_data is present in reference_data, label it 1, else label it 0. Since this location has two components Lat and Long, they BOTH need to be present next to each other anywhere in the reference dataframe. Is there an easy one-liner?

Upvotes: 1

Views: 1093

Answers (2)

Kacper Wolkowski
Kacper Wolkowski

Reputation: 1607

you can do also something like:

my_data.apply(lambda x: (x['Lat'] in reference_data['Lat'] and x['Lon'] in reference_data['Lon']) * 1.0, axis=1)

and then you can just assign it wherever you like.

or, the same way but maybe easier to see what's going on:

my_data.apply(lambda x: ((x['Lat'], x['Lon']) in zip(reference_data['Lat'], reference_data['Lon'])) * 1.0, axis=1)

Upvotes: 1

jfbeltran
jfbeltran

Reputation: 1818

You could generate this by using the merge function to join the reference_data to my_data with an indicator.

new_df = pd.merge(my_data, reference_data, on=['Lat','Lon'], how='left', indicator='flag')

You'll get a dataframe that should look exactly like my_data but include a new column "flag" which either says "left_only" or "both".

To get it as a [0,1] label:

new_df['bin_flag'] = (new_df['flag']=='both').astype(int)

To my knowledge, there is not an actual one-liner for this one.

Upvotes: 1

Related Questions