Pandas Load Latest Date Folder/CSV Files into Dataframe

Question

I have a zip file that contains multiple dated folders, in each folder, I have a datestamp.txt which have the date and multiple csv files.

For example:

In the Archives.zip: \Folder1 \Folder2

In each folder:

DATESTAMP.txt

a.csv

b.csv

So I have this zip file drop from upstream which contains multiple days of data, the date info contains in the datestamp.txt file (just a datestamp like 20200903), how can I just process the latest date csv files? ( Folder1/datestamp.txt: 20200903, Folder2/datestamp.txt: 20200904, so I just want to have Folder2's csv files)

I tried to read the date from the txt file first and sort them.

from zipfile import ZipFile

zip_file = ZipFile('data\Archives.zip')

timestamp={text_file.filename: pd.read_csv(zip_file.open(text_file.filename),header=None)
       for text_file in zip_file.infolist() if text_file.filename.endswith('.txt')}

dfs = {text_file.filename: pd.read_csv(zip_file.open(text_file.filename))
       for text_file in zip_file.infolist() if text_file.filename.endswith('.csv')}

Is there a way I can get the date directly from datestamp.txt and just read latest a.csv and b.csv?

Thank you

jsmart · Accepted Answer

Here is a way to find the latest date and corresponding folder. I used defaultdict to show if there is more than one folder with the latest date.

from collections import defaultdict

# create test data
metadata = [
    'Folder1/datestamp.txt: 20200903', # Sept 3
    'Folder2/datestamp.txt: 20200904',
    'Folder2/datestamp.txt: 20200903', # Sept 3 also (impossible?)
     ]

# initial value is empty list; just append without checking first
latest = defaultdict(list)

for m in metadata:
    folder = m.split('/', 1)[0]
    datestamp = m.rsplit(' ', 1)[-1]
    latest[datestamp].append(folder)
    
print('max date  :', max(latest))
print('folder(s) :', latest[max(latest)])

max date  : 20200904
folder(s) : ['Folder2']

Pandas Load Latest Date Folder/CSV Files into Dataframe

Answers (1)

Related Questions