Megan Byers
Megan Byers

Reputation: 169

Pairing two Pandas data frames with an ID value

I am trying to put together a useable set of data about glaciers. Our original data comes from an ArcGIS dataset, and latitude/longitude values were stored in a separate file, now detached from the CSV with all of our data. I am attempting to merge the latitude/longitude files with our data set. Heres a preview of what the files look like.

This is my main dataset file, glims (columns dropped for clarity)

| ANLYS_ID | GLAC_ID        | AREA  |
|----------|----------------|-------|
| 101215   | G286929E46788S | 2.401 |
| 101146   | G286929E46788S | 1.318 |
| 101162   | G286929E46788S | 0.061 |

This is the latitude-longitude file, coordinates

| lat   |   long  |        glacier_id |
|-------|---------|----------------|
| 1.187 | -70.166 | G001187E70166S |
| 2.050 | -70.629 | G002050E70629S |
| 3.299 | -54.407 | G002939E70509S |

The problem is, the coordinates data frame has one row for each glacier id with latitude longitude, whereas my glims data frame has multiple rows for each glacier id with varying data for each entry.

I need every single entry in my main data file to have a latitude-longitude value added to it, based on the matching glacier_id between the two data frames.

Heres what I've tried so far.

glims = pd.read_csv('glims_clean.csv')
coordinates = pd.read_csv('LatLong_GLIMS.csv')

df['que'] = np.where((coordinates['glacier_id'] == 
glims['GLAC_ID']))

error returns: 'int' object is not subscriptable

and:

glims.merge(coordinates, how='right', on=('glacier_id', 'GLAC_ID'))

error returns: int' object has no attribute 'merge'

I have no idea how to tackle this big of a merge. I am also afraid of making mistakes because it is nearly impossible to catch them, since the data carries no other identifying factors.

Any guidance would be awesome, thank you.

Upvotes: 0

Views: 281

Answers (2)

ndr
ndr

Reputation: 1437

This should work

glims = glims.merge(coordinates, how='left', left_on='GLAC_ID', right_on='glacier_id')

Upvotes: 1

rafaelc
rafaelc

Reputation: 59274

This a classic merging problem. One way to solve is using straight loc and index-matching

glims = glims.set_index('GLAC_ID')

glims.loc[:, 'lat']  = coord.set_index('glacier_id').lat
glims.loc[:, 'long'] = coord.set_index('glacier_id').long

glims = glims.reset_index()

You can also use pd.merge

pd.merge(glims,
         coord.rename(columns={'glacier_id': 'GLAC_ID'}), 
         on='GLAC_ID')

Upvotes: 1

Related Questions