luneice
luneice

Reputation: 157

Can pandas groupby implement that groupping element to a arrary bucket?

The df_trace which representes many bikes trace in a city, and df_trace likes this:

         LATITUDE   LONGITUDE
0        24.521047  118.161504
1        24.520763  118.161702
2        24.520385  118.162053
3        24.519880  118.162450
4        24.519403  118.162808
...            ...         ...
2504933  24.513834  118.167296
2504934  24.548244  118.101738
2504935  24.548293  118.101706
2504936  24.534143  118.096402
2504937  24.610413  118.113175

So I want to group trace to array bucket, and the array bucket can be seen as grid of this city.And the array bucket is a 100 * 100 grid, each cell of grid representes a region which can be griddinglize by the city region, so in this case each cell size is (max_longitude - min_longitude) / 100 by (max_latitude - min_latitude) / 100:

bucket = [
  [[], [], ..., []],
  [[], [], ..., []],
  ...
  [[], [], ..., []]
]

The element of df_trace should be put into a cell of above grid, and the element longitude and latitude within the region of each cell. So can pandas groupby implement that?

Upvotes: 1

Views: 176

Answers (1)

anon01
anon01

Reputation: 11171

I think pd.cut will do what you want. An example:

import numpy as np
import pandas as pd

# generate random walk (bike) trajectory
x = np.cumsum(np.random.normal(size=100000)).reshape(-1,1)
y = np.cumsum(np.random.normal(size=100000)).reshape(-1,1)
df = pd.DataFrame(data=np.hstack([x,y]), columns=["x", "y"])

# define lat/long regions
N = 10
xbins = np.linspace(x.min(), x.max(), N)
ybins = np.linspace(y.min(), y.max(), N)
df["lon"] = pd.cut(df.x, bins=xbins)
df["lat"] = pd.cut(df.y, bins=ybins)

# now you can groupby each lat/lon and do some operation, e.g. count points:
df.groupby(["lat","lon"]).count().sort_values("x", ascending=False)

                                               x      y
lat                 lon
(-40.524, 8.152]    (502.077, 613.429]  10510  10510
(-89.201, -40.524]  (724.782, 836.134]   6787   6787
(-40.524, 8.152]    (613.429, 724.782]   5873   5873
(-137.877, -89.201] (836.134, 947.486]   5073   5073
(-89.201, -40.524]  (613.429, 724.782]   5025   5025

enter image description here

Upvotes: 1

Related Questions