Reputation: 153
I am new to pandas. I have a csv file which has a latitude and longitude columns and also a tile ID column, the file has around 1 million rows. I have a list of around a hundred tile ID's and want to get the latitude and longitude coordinates for these tile ID's. Currently I have:
good_tiles_str = [str(q) for q in good_tiles]#setting list elements to string data type
file['tile'] = file.tile.astype(str)#setting title column to string data type
for i in range (len(good_tiles_str)):
x = good_tiles_str[i]
lat = file.loc[file['tile'].str.contains(x), 'BL_Latitude'] #finding lat coordinates
long = file.loc[file['tile'].str.contains(x), 'BL_Longitude'] #finding long coordinates
print(lat)
print(long)
This method is very slow and I know it is not the correct way as I heard you should not use for loops like this whilst using pandas. Also, it does not work as it doesn't find all the latitude and longitude points for the tile ID's
Any help would be very gladly appreciated
Upvotes: 0
Views: 136
Reputation: 3399
Try this:
search_for = '|'.join(good_tiles_str)
good = file[file.tile.str.contains(search_for)]
good = good[['BL_Latitude', 'BL_Longitude']].drop_duplicates()
Upvotes: 0
Reputation: 113
There is no need to iterate rows explicitly , I think as far as I understood your question.
If you wish a particular assignment given a condition, you can do so explicitly. Here's one way using numpy.where; we use ~ to indicate "negative".
rule1= file['tile'].str.contains(x)
rule2= file['tile'].str.contains(x)
file['flag'] = np.where(rule1 , 'BL_Latitude', " " )
file['flag'] = np.where(rule2 & ~rule1, 'BL_Longitude', file['flag'])
Upvotes: 1