Reputation: 169
I am trying to put together a useable set of data about glaciers. Our original data comes from an ArcGIS dataset, and latitude/longitude values were stored in a separate file, now detached from the CSV with all of our data. I am attempting to merge the latitude/longitude files with our data set. Heres a preview of what the files look like.
This is my main dataset file, glims (columns dropped for clarity)
| ANLYS_ID | GLAC_ID | AREA |
|----------|----------------|-------|
| 101215 | G286929E46788S | 2.401 |
| 101146 | G286929E46788S | 1.318 |
| 101162 | G286929E46788S | 0.061 |
This is the latitude-longitude file, coordinates
| lat | long | glacier_id |
|-------|---------|----------------|
| 1.187 | -70.166 | G001187E70166S |
| 2.050 | -70.629 | G002050E70629S |
| 3.299 | -54.407 | G002939E70509S |
The problem is, the coordinates data frame has one row for each glacier id with latitude longitude, whereas my glims data frame has multiple rows for each glacier id with varying data for each entry.
I need every single entry in my main data file to have a latitude-longitude value added to it, based on the matching glacier_id between the two data frames.
Heres what I've tried so far.
glims = pd.read_csv('glims_clean.csv')
coordinates = pd.read_csv('LatLong_GLIMS.csv')
df['que'] = np.where((coordinates['glacier_id'] ==
glims['GLAC_ID']))
error returns: 'int' object is not subscriptable
and:
glims.merge(coordinates, how='right', on=('glacier_id', 'GLAC_ID'))
error returns: int' object has no attribute 'merge'
I have no idea how to tackle this big of a merge. I am also afraid of making mistakes because it is nearly impossible to catch them, since the data carries no other identifying factors.
Any guidance would be awesome, thank you.
Upvotes: 0
Views: 281
Reputation: 1437
This should work
glims = glims.merge(coordinates, how='left', left_on='GLAC_ID', right_on='glacier_id')
Upvotes: 1
Reputation: 59274
This a classic merging problem. One way to solve is using straight loc
and index-matching
glims = glims.set_index('GLAC_ID')
glims.loc[:, 'lat'] = coord.set_index('glacier_id').lat
glims.loc[:, 'long'] = coord.set_index('glacier_id').long
glims = glims.reset_index()
You can also use pd.merge
pd.merge(glims,
coord.rename(columns={'glacier_id': 'GLAC_ID'}),
on='GLAC_ID')
Upvotes: 1