forvas
forvas

Reputation: 10189

Matching filenames containing a particular string in delimited format

I created a program which makes backups of PostgreSQL databases. The name of each file/backup is like this:

prefix + database_name + creation_date + suffix

Where prefix (which is optional) and database_name are edited by the user, creation_date is the date when the backup was made and suffix can be .dump, .bz2, .gz or .zip.

The problem now is that the program has to give the chance to the user to delete every backup of one particular database, this is, I have to find every file/backup which has the database_name in its name.

But I have to take into account that a database's name could be, for example, '01', and this could be a big problem if the prefix or the creation_date have a '01'... so for this reason I can't just check if database_name is in the file's name.

Any idea to solve this? I racked my brains but I have no idea.

Upvotes: 1

Views: 55

Answers (1)

Tim Pietzcker
Tim Pietzcker

Reputation: 336498

You need to split the string into its parts; a regex can help with that.

For example:

import re
regex = re.compile(r"^([^_]*)_(.*)_(\d{8}_\d{6}_[^_]+\.(?:dump|bz2|gz|zip))$")

You could then check if the second group matches the query:

>>> regex.search("prefix_dbname_20140409_115239_CEST.dump").groups()
('prefix', 'dbname', '20140409_115239_CEST.dump')

So if regex.search(myfile).groups()[1] contains your search string, you can delete myfile.

Explanation:

^                      # Start of string
([^_]*)                # Group 1 (prefix): Any number of characters except _
_                      # _
(.*)                   # Group 2 (dbname): Any number of characters
_                      # _
(                      # Group 3 (timestamp/extension):
 \d{8}_\d{6}           # Date_Time
 _[^_]+                # _Timezone
 \.(?:dump|bz2|gz|zip) # .Extension
)                      # End of group 3
$                      # End of string

Upvotes: 1

Related Questions