Reputation: 741
I am looking for something quite specific that I can't quite find the answer to.
I have two dataframes. One that contains an ID, latitude and longitude. The other has just an ID.
I want to store in a list the latitude and longitude as long as the ID from Dataframe A exists in Dataframe B using list comprehension. I can get the first part working fine, but matching the IDs appears to be causing a problem. This is what I have so far:
heat_data = [[row['latitude'],row['longitude']] for index, row in extract.iterrows() if row['NBN Location Id'] in closed['SP Order Location ID']]
To me, that says store 'latitude' and 'longitude' from extract as long as the ID exists in the other dataframe (closed). However this causes no data to be retrieved. Can anyone guide me as to where I'm going wrong? If I exclude the last 'if' statement, it works fine. So how else am I supposed to be doing this if statement?
Thanks!
Upvotes: 4
Views: 10247
Reputation: 294248
Using @jezrael's data
ids = {*closed['SP Order Location ID']}
cols = ['latitude', 'longitude', 'NBN Location Id']
[p for *p, i in zip(*map(extract.get, cols)) if i in ids]
[['lat1', 'long1'], ['lat3', 'long3'], ['lat4', 'long4']]
closed = pd.DataFrame({'SP Order Location ID':list('ace')})
extract = pd.DataFrame({'NBN Location Id':list('abcde'),
'latitude':['lat1','lat2','lat3','lat4','lat4'],
'longitude':['long1','long2','long3','long4','long4']})
Upvotes: 1
Reputation: 862591
I think list comprehension is not necessary, better and faster is use vectorized solution by filter by boolean indexing
with isin
and then convert to lists:
mask = extract['NBN Location Id'].isin(closed['SP Order Location ID'])
cheat_data = extract.loc[mask, ['latitude', 'longitude']].values.tolist()
Sample:
closed = pd.DataFrame({'SP Order Location ID':list('ace')})
print (closed)
SP Order Location ID
0 a
1 c
2 e
extract = pd.DataFrame({'NBN Location Id':list('abcde'),
'latitude':['lat1','lat2','lat3','lat4','lat4'],
'longitude':['long1','long2','long3','long4','long4']})
print (extract)
NBN Location Id latitude longitude
0 a lat1 long1
1 b lat2 long2
2 c lat3 long3
3 d lat4 long4
4 e lat4 long4
mask = extract['NBN Location Id'].isin(closed['SP Order Location ID'])
cheat_data = extract.loc[mask, ['latitude', 'longitude']].values.tolist()
print (cheat_data)
[['lat1', 'long1'], ['lat3', 'long3'], ['lat4', 'long4']]
Your solution failed, because in in pandas check index values, not values of Series
, so need convert to list:
cheat_data = [[row['latitude'],row['longitude']] for index, row in extract.iterrows()
if row['NBN Location Id'] in closed['SP Order Location ID'].tolist()]
print (cheat_data)
[['lat1', 'long1'], ['lat3', 'long3'], ['lat4', 'long4']]
#changed index values
closed = pd.DataFrame({'SP Order Location ID':list('ace')}, index=list('dbw'))
print (closed)
SP Order Location ID
d a
b c
w e
cheat_data = [[row['latitude'],row['longitude']] for index, row in extract.iterrows()
if row['NBN Location Id'] in closed['SP Order Location ID']]
print (cheat_data)
[['lat2', 'long2'], ['lat4', 'long4']]
Upvotes: 3