Reputation: 20565
I sometime have a very large size of gtfs zip file - valid for a period of 6 months, but this is not economic for loading such big data size into a low resource (for example, 2 gig of memory and 10 gig hard disk) EC2 server.
I hope to be able split this large size gtfs into 3 smaller gtfs zip files with 2 months (6months/3files) period worth of valid data, of course that means I will need to replace data every 2 months.
I have found a python program that achieve the opposite goal MERGE here https://github.com/google/transitfeed/blob/master/merge.py (this is a very good python project btw.)
I am very thankful for any pointer.
Best regards,
Dunn.
Upvotes: 2
Views: 1189
Reputation: 8054
Another, more recent option for processing large GTFS files is transitland-lib. It's written in the Go programming language, which is quite efficient at parsing huge GTFS feeds.
See the transitland extract
command, which can take a number of arguments to cut an existing GTFS feed down to smaller size:
% transitland extract --help
Usage: extract <input> <output>
-allow-entity-errors
Allow entities with errors to be copied
-allow-reference-errors
Allow entities with reference errors to be copied
-create
Create a basic database schema if none exists
-create-missing-shapes
Create missing Shapes from Trip stop-to-stop geometries
-ext value
Include GTFS Extension
-extract-agency value
Extract Agency
-extract-calendar value
Extract Calendar
-extract-route value
Extract Route
-extract-route-type value
Extract Routes matching route_type
-extract-stop value
Extract Stop
-extract-trip value
Extract Trip
-fvid int
Specify FeedVersionID when writing to a database
-interpolate-stop-times
Interpolate missing StopTime arrival/departure values
-normalize-service-ids
Create Calendar entities for CalendarDate service_id's
-set value
Set values on output; format is filename,id,key,value
-use-basic-route-types
Collapse extended route_type's into basic GTFS values
Upvotes: 0
Reputation: 7887
It's worth noting that entries in stop_times.txt are usually the biggest memory hog when it comes to loading a GTFS feed. Since most systems do not replicate trips+stop_times for the dates when those trips are active, reducing the service calendar probably won't save you much.
That said, there are some tools for slicing and dicing GTFS. Check out the OneBusAway GTFS Transformer tool, for example:
Upvotes: 2