Reputation: 45
I'm pretty new to all this python and Excel things. I have a really big excel sheet with 1 million rows, my goal is to create a list with a few for list (I think that is what I need to do) and the problem is the follow:
I have something like this:
I want to add to a list all the POINT_Y and POINT_X (gps coordinates) where the objectid is the same and when the objectid changes I want to add those values to another list.
I only manage to get the data from the excel, my problem is in the construction of the lists.
import pandas
from shapely.geometry import Point
polygon_sheet =pandas.read_excel("C:...\coords.xlsx")
for row in polygon_sheet.index:
data = (polygon_sheet['POINT_Y'][row]) + ", " + \
(polygon_sheet['POINT_X'][row])
I'm sure I need a nested for loop but I don't know how to obtain what I want, that is something like this:
id1 = [[41.15226404, 25.41327506], [41.15221095, 25.41363158], [41.15220779, 25.41395401], [...(all coords from id1)]]
id2= [[41.11617356, 25.53488120], [41.11541559, 25.53517417], [41.11494893, 25.53531324], [...(all coords from id2]]
And so on until all ID's are iterated.
Hope I could explain what I expect correctly. Appreciate any help
Upvotes: 0
Views: 288
Reputation: 17322
you can apply a custom function over each row in order to create a dict with your desired output:
from collections import defaultdict
id_points = defaultdict(list)
def extrct_points(row):
o_id, point_x, point_y = row
id_points[o_id].append([point_x, point_y])
polygon_sheet.apply(extrct_points, axis=1)
for example:
polygon_sheet = pd.DataFrame({'OBJECTID': [1, 1, 2, 2, 2, 3], 'POINT_X':[0.23, 0.123, 0.1, 0.1, 0.2, 1],
'POINT_Y':[0.223, 0.3123, 0.11, 0.11, 0.22, 11]})
from pprint import pprint
polygon_sheet.apply(extrct_points, axis=1)
pprint(id_points)
output:
defaultdict(<class 'list'>,
{1.0: [[0.23, 0.223], [0.123, 0.3123]],
2.0: [[0.1, 0.11], [0.1, 0.11], [0.2, 0.22]],
3.0: [[1.0, 11.0]]})
Upvotes: 1
Reputation: 3103
You can create a dict of lists.Try something like this:
import pandas
from shapely.geometry import Point
dt = {}
polygon_sheet =pandas.read_excel("C:...\coords.xlsx")
for row in polygon_sheet.index:
try:
dt[polygon_sheet['ID'][row]] += [[polygon_sheet['POINT_Y'][row], polygon_sheet['POINT_X'][row]]]
except:
dt.update({polygon_sheet['ID'][row]: [[polygon_sheet['POINT_Y'][row], polygon_sheet['POINT_X'][row]]]})
print(dt)
Upvotes: 1