Andrew_457
Andrew_457

Reputation: 87

sort files and parse filenames in python

I have a folder with csv files, which names indicate date and hour when one boy comes at home every day during summer holidays: for instance andrew201507011700.csv tells me that he comes at home the first July at 17:00. So my goal is to sort the files in the folder and then extract the timestampes , indicated in the filenames.

for example for files in folder:

andrew201509030515.csv
andrew201507011700.csv
andrew201506021930.csv
andrew201508110000.csv

I'de like to sort them, based on these timestamps:

andrew201506021930.csv
andrew201507011700.csv
andrew201508110000.csv
andrew201509030515.csv

and then, iterating over this sorted list of files,extract the timestamp as a columns for every inner dataframe, for example for file andrew201506021930.csv obtain a column with some basic native python datetime format:

datetime
2015:06:02:19:30

I tried the following method, firstly to split the filename and sort based on numerical values, and than to get 12 last characters of its basename:

path_sort=sorted(os.listdir(path),key=lambda x: int(x.split('w')[0]))
for i in path_sort:
    fi=os.path.join(path_sort, i)
    return os.path.basename(fi)[-12:]

It seems to me wrong, I don't use any datetime method for sorting the files, moreover it throws me an error already for this line fi=os.path.join(path_sort, i)

AttributeError: 'list' object has no attribute 'endswith'

Upvotes: 1

Views: 1337

Answers (1)

Hakim
Hakim

Reputation: 1314

Try this: (maybe cleanup the regex a bit more if you're not sure all your filenames have the same format)

from os import listdir
from os.path import isfile, join
import re

def extract_number(string):
    r = re.compile(r'(\d+)')
    return int(r.findall(string)[0])

MyDir = 'exampls/'
onlyfiles = [f for f in listdir(MyDir) if isfile(join(MyDir, f))]
sortedFiles = sorted(onlyfiles ,key=lambda x: extract_number(x) )

Upvotes: 1

Related Questions