tootiefairy
tootiefairy

Reputation: 7

How do I adjust my code to parse dates with dashes?

So in Python, I'm trying to create a File checker. I have 2 inputs : filename and the pattern. The filepattern will always have the date pattern enclosed in { }

For example,

filename : File_20240621

pattern : File_{YYYYMMDD}

The function should extract the date from the filename. So in this case, it will return 20240621.

Now, what I need this function to do is become flexible to account for various date formats for example YYYYMMDD, MMDD, DDMM, YYYY, MM, DD, as well as dates with Dashes in them YYYY-MM-DD, YYYY-MM, MM-DD

This is the current function I have but it does not work on dates with dashes, will anyone be able to help me on this?

def extract_date_from_filename(filename, pattern):
    """
    Extracts the date part from the filename based on the provided pattern.
    Returns the date as a string.
    """
    # Extract the date format part from the pattern, assuming it is enclosed in {}
    date_pattern = re.search(r'\{(.*?)\}', pattern).group(1)

    # Create a regex pattern to extract the date from the filename
    regex_pattern = re.escape(pattern).replace(r'\{'+date_pattern+r'\}', r'(\d{'+str(len(date_pattern))+r'})')
    
    # Extract the date from the filename
    match = re.search(regex_pattern, filename)
    
    if match:
        # Return the extracted date string
        return match.group(1)
    else:
        return None

Upvotes: 0

Views: 54

Answers (1)

derteufelqwe
derteufelqwe

Reputation: 151

I suggest you take a look at pythons datetime module and its datetime.strptime(...) function. It allows you to parse timestamps.

import re
from datetime import datetime, date


def extract_date_from_filename(filename, pattern):
    """
    Extracts the date part from the filename based on the provided pattern.
    Returns the date as a string.
    """
    # Extract the date format part from the pattern, assuming it is enclosed in {}
    date_format_pattern = re.search(r'\{(.*?)\}', pattern).group(1)

    # Create a regex pattern to extract the date from the filename
    regex_pattern = pattern.replace('{' + date_format_pattern + '}', r'(\S+)')

    # Extract the date from the filename
    match = re.search(regex_pattern, filename)

    if match:
        # Parse the extracted string using the datetime format
        date_str = match.group(1)
        return datetime.strptime(date_str, date_format_pattern).date()
    else:
        return None


print(extract_date_from_filename('File_20240621', 'File_{%Y%m%d}'))
print(extract_date_from_filename('File_240621', 'File_{%y%m%d}'))
print(extract_date_from_filename('File_21-06-2023', 'File_{%d-%m-%Y}'))

I modified you function slightly to use the datetime modules format strings. You can find the description of the used characters here. After extracting the timestamp string (I adjusted the regex for that) it is parsed by the datetime module so you have the flexibility to support more formats later on.

Upvotes: 1

Related Questions