Reputation: 151
I wrote a program that filters files containing to pull location and time from specific ones. Each file contains one day's worth of tweets.
I would like to run this program over one year's worth of tweets, which would involve iterating over 365 folders with names like this: 2011--.tweets.dat.gz, with the stars representing numbers that complete the file name to make it a date for each day in the year.
Basically, I'm looking for code that will loop over 2011-01-01.tweets.dat.gz, 2011-01-02.tweets.dat.gz, ..., all the way through 2011-12-31.tweets.dat.gz.
What I'm imagining now is somehow telling the program to loop over all files with the name 2011-*.tweets.dat.gz, but I'm not sure exactly how that would work or how to structure it, or even if the * syntax is correct.
Any tips?
Upvotes: 3
Views: 1732
Reputation: 184455
Easiest way is indeed with a glob:
import from glob import iglob
for pathname in iglob("/path/to/folder/2011-*.tweets.dat.gz"):
print pathname # or do whatever
Upvotes: 1
Reputation: 251186
Use the datetime
module:
>>> from datetime import datetime,timedelta
>>> d = datetime(2011,1,1)
while d < datetime(2012,1,1) :
filename = "{}{}".format(d.strftime("%Y-%m-%d"),'.tweets.dat.gz')
print filename
d = d + timedelta(days = 1)
...
2011-01-01.tweets.dat.gz
2011-01-02.tweets.dat.gz
2011-01-03.tweets.dat.gz
2011-01-04.tweets.dat.gz
2011-01-05.tweets.dat.gz
2011-01-06.tweets.dat.gz
2011-01-07.tweets.dat.gz
2011-01-08.tweets.dat.gz
2011-01-09.tweets.dat.gz
2011-01-10.tweets.dat.gz
...
...
2011-12-27.tweets.dat.gz
2011-12-28.tweets.dat.gz
2011-12-29.tweets.dat.gz
2011-12-30.tweets.dat.gz
2011-12-31.tweets.dat.gz
Upvotes: 1