How do I compare two columns at once against two different data frames in python (pandas)?

Question

df1 contains two columns of Lat and Long, and several thousand rows. df2 also contains two columns of lat and long with many rows. Essentially, df2 is a list of reference locations that I want to compare df1 with. I want to compare both the Latitude and Longitude of df1 with df2 to say their locations match, or say they don't. i.e.,

my_data = pd.read_csv('/path/to/file', usecols = ['Lat','Lon'])
reference_data = pd.read_csv('/path/to/file', usecols = ['Lat','Lon'])

In simpler words, I want to say that if the location in each row in my_data is present in reference_data, label it 1, else label it 0. Since this location has two components Lat and Long, they BOTH need to be present next to each other anywhere in the reference dataframe. Is there an easy one-liner?

jfbeltran · Accepted Answer

You could generate this by using the merge function to join the reference_data to my_data with an indicator.

new_df = pd.merge(my_data, reference_data, on=['Lat','Lon'], how='left', indicator='flag')

You'll get a dataframe that should look exactly like my_data but include a new column "flag" which either says "left_only" or "both".

To get it as a [0,1] label:

new_df['bin_flag'] = (new_df['flag']=='both').astype(int)

To my knowledge, there is not an actual one-liner for this one.

How do I compare two columns at once against two different data frames in python (pandas)?

Answers (2)

Related Questions