Reputation: 157
The df_trace
which representes many bikes trace in a city, and df_trace
likes this:
LATITUDE LONGITUDE
0 24.521047 118.161504
1 24.520763 118.161702
2 24.520385 118.162053
3 24.519880 118.162450
4 24.519403 118.162808
... ... ...
2504933 24.513834 118.167296
2504934 24.548244 118.101738
2504935 24.548293 118.101706
2504936 24.534143 118.096402
2504937 24.610413 118.113175
So I want to group trace to array bucket, and the array bucket can be seen as grid of this city.And the array bucket is a 100 * 100
grid, each cell of grid representes a region which can be griddinglize by the city region, so in this case each cell size is (max_longitude - min_longitude) / 100
by (max_latitude - min_latitude) / 100
:
bucket = [
[[], [], ..., []],
[[], [], ..., []],
...
[[], [], ..., []]
]
The element of df_trace
should be put into a cell of above grid, and the element longitude and latitude within the region of each cell. So can pandas groupby implement that?
Upvotes: 1
Views: 176
Reputation: 11171
I think pd.cut
will do what you want. An example:
import numpy as np
import pandas as pd
# generate random walk (bike) trajectory
x = np.cumsum(np.random.normal(size=100000)).reshape(-1,1)
y = np.cumsum(np.random.normal(size=100000)).reshape(-1,1)
df = pd.DataFrame(data=np.hstack([x,y]), columns=["x", "y"])
# define lat/long regions
N = 10
xbins = np.linspace(x.min(), x.max(), N)
ybins = np.linspace(y.min(), y.max(), N)
df["lon"] = pd.cut(df.x, bins=xbins)
df["lat"] = pd.cut(df.y, bins=ybins)
# now you can groupby each lat/lon and do some operation, e.g. count points:
df.groupby(["lat","lon"]).count().sort_values("x", ascending=False)
x y
lat lon
(-40.524, 8.152] (502.077, 613.429] 10510 10510
(-89.201, -40.524] (724.782, 836.134] 6787 6787
(-40.524, 8.152] (613.429, 724.782] 5873 5873
(-137.877, -89.201] (836.134, 947.486] 5073 5073
(-89.201, -40.524] (613.429, 724.782] 5025 5025
Upvotes: 1