K saman
K saman

Reputation: 131

How can I find only the files that contains date for last day of the month, using Python?

I have a folder with hundreds of .xlsx files which are daily data from years ago.

The file name pattern is 'ABC DEF 20150501.xlsx'

How can I pick files from the last day of each months for all these years.

Sadly I have no clue how to do it.

import pandas as pd
import re

files = [f for f in os.listdir(path) if re.match(r'ABC DEF [0-9]+.xlsx', f)]
df = pd.DataFrame(files)
df.columns = ['file_name']
df['port_date'] = df['file_name'].apply(lambda x : x.split(' ')[3].split('.')[0])

I got only the date in one column, and run out of idea!

Any help will be appreciated.

Upvotes: 1

Views: 205

Answers (3)

norie
norie

Reputation: 9857

The following code uses a function that adds a day to the date and then compares the month of the new date to the month of the original date, if they are different the original date is the last day of the month.

import pandas as pd
from datetime import datetime, timedelta

def last_day(to_date):
    delta = timedelta(days=1)
    next_day = to_date + delta
    if to_date.month != next_day.month:
        return True
    return False

# read dummy filename data
df = pd.read_csv('test.csv')

df['port_date'] = pd.to_datetime(df['file_name'].str[8:16])

df['lastday'] = df['port_date'].apply(lambda dt: last_day(dt))

Upvotes: 2

jezrael
jezrael

Reputation: 863166

If need last row for each month and year extract datetimes, sorting and gruping by years and months with GroupBy.last:

df = pd.DataFrame({'file_name':['ABC DEF 20150501.xlsx',
                                'ABC DEF 20150701.xlsx',
                                'ABC DEF 20150711.xlsx']})

print (df)
               file_name
0  ABC DEF 20150501.xlsx
1  ABC DEF 20150701.xlsx
2  ABC DEF 20150711.xlsx

df['port_date'] = pd.to_datetime(df['file_name'].str.extract('(\d+)\.', expand=False))
df = df.sort_values('port_date')

df = (df.groupby([df['port_date'].dt.year, df['port_date'].dt.month])
        .last()
        .reset_index(drop=True))
print (df)
               file_name  port_date
0  ABC DEF 20150501.xlsx 2015-05-01
1  ABC DEF 20150711.xlsx 2015-07-11

Upvotes: 1

Vulwsztyn
Vulwsztyn

Reputation: 2271

Declare a function:

def contains_date_of_last_day(name):
  last_days = [
    '0131',
    '0228',
    '0331',
    '0430',
    '0531',
    '0630',
    '0731',
    '0831',
    '0930',
    '1031',
    '1130',
    '1231',
    ]
  for i in last_days:
    if i in name: return True
      return False
    

Check if contains date of last day using the function:

files = [f for f in os.listdir(path) if contains_date_of_last_day(f)]

Upvotes: 1

Related Questions