Kar
Kar

Reputation: 1014

How to get the filename without the date part in Python?

the filename can be any of below examples

abc_dec_2020_06_23.csv
efg_edd_20200623.csv
abc_20200623121935.csv

I need to extract only by excluding the number part

abc_dec_
efg_edd
abc_

I am trying to achieve the archive the previous file present in the SFTP location

Below is what I am trying to achieve

fileName = self.s3_key.split('/')[-1]
sftp_client.rename( self.sftp_path + fileName,  archive_path + fileName)
 with sftp_client.open(self.sftp_path + fileName, 'wb') as f:
        s3_client.download_fileobj(self.s3_bucket, self.s3_key, f)

Upvotes: 0

Views: 313

Answers (2)

AndrewQ
AndrewQ

Reputation: 420

With a regular expression:

r"^[a-z_]+"

Example:

import re
regex_comp = re.compile(r"^[a-z_]+")
match_str = regex_comp.match("abc_20200623121935.csv")
print(match_str.group())

Result:

abc_

If your filenames have digits:

import re
filenames = ["efg_12_edd_20200623.csv", "abc_dec_2020_06_23.csv",
             "efg_edd_20200623.csv", "a1b2c11_20200623121935.csv"]

regex1 = re.compile(r"[0-9]{4}_[0-9]{2}_[0-9]{2}\.csv$")
regex2 = re.compile(r"[0-9]{8,14}\.csv$")

filename = ""
for filename_full in filenames:
    test = regex1.search(filename_full)
    if test is None:
        test = regex2.search(filename_full)
    if test is not None:
        filename = filename_full[:test.span()[0]]
        print(filename)
    else:
        print(filename_full, ": No match")

Result:

efg_12_edd_
abc_dec_
efg_edd_
a1b2c11_

Upvotes: 2

MrNobody33
MrNobody33

Reputation: 6483

You could try this:

file='abc_dec_2020_06_23.csv'
cleanfile=''
for let in file:
    if let.isdigit():
        break
    else:
        cleanfile+=let
  

print(cleanfile)

Output:

'abc_dec_'

And if your filenames have digits, you can try this:

x='abc_12_dec_2020_06_23.csv'
newval=''
for i,val in enumerate(x.split('_')):
    if i==len(x.split('_'))-1:
        if len(val.replace('.csv',''))<8 and len(list(x.split('_'))[i-1])>2: #e.g. 202006_23.csv'
            newval='_'.join(list(x.split('_'))[:i-1])+'_'
        elif len(val.replace('.csv',''))<8 and len(list(x.split('_'))[i-1])==2: #e.g. 2020_06_23.csv'
            newval='_'.join(list(x.split('_'))[:i-2])+'_'
        elif len(val.replace('.csv',''))<8 and len(val.replace('.csv',''))==4: #e.g. 2020_0623.csv'
            newval='_'.join(list(x.split('_'))[:i-1])+'_'
        else:
            newval='_'.join(list(x.split('_'))[:i])+'_'
print(newval)

Output:

'abc_12_dec_'

Upvotes: 1

Related Questions