Reputation: 18047
In GTFS (defines public transportation schedules and geographic information), a station (parent_station
) contains several stops (stop_id
).
I am analyzing Paris GTFS data. All parent_station
fields are blank value.
mysql> SELECT DISTINCT parent_station FROM stops;
+----------------+
| parent_station |
+----------------+
| |
| 0 |
+----------------+
How do I specify parent staions for stops (or group stops into parent station)?
mysql> SELECT * FROM stops LIMIT 10;
+---------+-----------+------------------------------------+-------------------------------------------+-----------+----------+---------------+----------------+
| stop_id | stop_code | stop_name | stop_desc | stop_lat | stop_lon | location_type | parent_station |
+---------+-----------+------------------------------------+-------------------------------------------+-----------+----------+---------------+----------------+
| 1166824 | | "Olympiades" | "91 rue de Tolbiac - 75113" | 48.826948 | 2.367038 | 0 | |
| 1166825 | | "Olympiades" | "91 rue de Tolbiac - 75113" | 48.826948 | 2.367038 | 0 | |
| 1166826 | | "Bibliotheque-Francois Mitterrand" | "Face au 62 rue du Chevaleret - 75113" | 48.829831 | 2.376120 | 0 | |
| 1166827 | | "Bibliotheque-Francois Mitterrand" | "Face au 62 rue du Chevaleret - 75113" | 48.829831 | 2.376120 | 0 | |
| 1166828 | | "Cour Saint-Emilion" | "Cour Chamonard - 75112" | 48.833314 | 2.387300 | 0 | |
| 1166829 | | "Cour Saint-Emilion" | "Cour Chamonard - 75112" | 48.833314 | 2.387300 | 0 | |
| 1166830 | | "Bercy" | "Place du Bataillon du Pacifique - 75112" | 48.840543 | 2.379409 | 0 | |
| 1166831 | | "Bercy" | "Place du Bataillon du Pacifique - 75112" | 48.840543 | 2.379409 | 0 | |
| 1166832 | | "Gare de Lyon" | "Gare SNCF - 75112" | 48.844652 | 2.373108 | 0 | |
| 1166833 | | "Gare de Lyon" | "Gare SNCF - 75112" | 48.844652 | 2.373108 | 0 | |
+---------+-----------+------------------------------------+-------------------------------------------+-----------+----------+---------------+----------------+
The stop 1166830
and 1166831
should belong to the same parent station for the same longitude and lantitude.
One idea comes into my mind. With a given radius (say r
), two stops belong to a same station if their distance (say d
) is less than r
, i.e., d < r
.
Any better ideas?
Upvotes: 0
Views: 730
Reputation: 962
Assuming that you are sure that stop entries are not duplicates but they are stops located inside station, I propose following solution: Find list of different stops with same name and location, then edit to indicate the first stop in the list as a "station" and other remaining stops in the list as stops inside the station.
Reference document will help you to do it. As an example I give you following edited (shown with ^^^^) rows:
| 1166830 | | "Bercy"| "Place du Bataillon du Pacifique - 75112" | 48.840543 | 2.379409 | 1 | |
^^^
| 1166831 | | "Bercy"| "Place du Bataillon du Pacifique - 75112" | 48.840543 | 2.379409 | 0 | 1166830 |
^^^^^^^
Upvotes: 2