Worcestershire
Worcestershire

Reputation: 151

Python: How do I iterate over several files with similar names (the variation in each name is the date)?

I wrote a program that filters files containing to pull location and time from specific ones. Each file contains one day's worth of tweets.

I would like to run this program over one year's worth of tweets, which would involve iterating over 365 folders with names like this: 2011--.tweets.dat.gz, with the stars representing numbers that complete the file name to make it a date for each day in the year.

Basically, I'm looking for code that will loop over 2011-01-01.tweets.dat.gz, 2011-01-02.tweets.dat.gz, ..., all the way through 2011-12-31.tweets.dat.gz.

What I'm imagining now is somehow telling the program to loop over all files with the name 2011-*.tweets.dat.gz, but I'm not sure exactly how that would work or how to structure it, or even if the * syntax is correct.

Any tips?

Upvotes: 3

Views: 1732

Answers (2)

kindall
kindall

Reputation: 184455

Easiest way is indeed with a glob:

import from glob import iglob

for pathname in iglob("/path/to/folder/2011-*.tweets.dat.gz"):
   print pathname   # or do whatever

Upvotes: 1

Ashwini Chaudhary
Ashwini Chaudhary

Reputation: 251186

Use the datetime module:

>>> from datetime import datetime,timedelta
>>> d = datetime(2011,1,1)
while d < datetime(2012,1,1) :
    filename = "{}{}".format(d.strftime("%Y-%m-%d"),'.tweets.dat.gz')
    print filename
    d = d + timedelta(days = 1)
...     
2011-01-01.tweets.dat.gz
2011-01-02.tweets.dat.gz
2011-01-03.tweets.dat.gz
2011-01-04.tweets.dat.gz
2011-01-05.tweets.dat.gz
2011-01-06.tweets.dat.gz
2011-01-07.tweets.dat.gz
2011-01-08.tweets.dat.gz
2011-01-09.tweets.dat.gz
2011-01-10.tweets.dat.gz
    ...
    ...
2011-12-27.tweets.dat.gz
2011-12-28.tweets.dat.gz
2011-12-29.tweets.dat.gz
2011-12-30.tweets.dat.gz
2011-12-31.tweets.dat.gz

Upvotes: 1

Related Questions