D_P
D_P

Reputation: 862

How to validate a file starts with certain perfix format?

I have a list of files with names like this.

["TYBN-220422-257172171.txt", "TYBN-120522-257172174.txt", "TYBN-320422-657172171.txt", "TYBN-220622-237172174.txt", "TYBN-FRTRE-FFF.txt",....]

I want to get only the files which has format like this TYBN-220422-257172171.txt

valid = "TYBN-{}-{}".format(numericvalue, numericvalue) I want this type of files only in the list.

Upvotes: 0

Views: 57

Answers (3)

Nick
Nick

Reputation: 147206

This is probably most easily done using a regex to match the desired format i.e.

TYBN-\d+-\d+\.txt$

which looks for a name starting with the characters TYBN- followed by one or more digits (\d+), a -, some more digits and then finishing with .txt.

Note that when using re.match (as in the code below), matches are automatically anchored to the start of the string and thus a leading ^ (start-of-string anchor) is not required on the regex.

In python:

import re
filelist = ["TYBN-220422-257172171.txt",
            "TYBN-120522-257172174.txt",
            "TYBN-320422-657172171.txt",
            "TYBN-220622-237172174.txt",
            "TYBN-FRTRE-FFF.txt"
           ]
regex = re.compile(r'TYBN-\d+-\d+\.txt$')
valid = [file for file in filelist if regex.match(file)]

Output:

[
 'TYBN-220422-257172171.txt',
 'TYBN-120522-257172174.txt',
 'TYBN-320422-657172171.txt',
 'TYBN-220622-237172174.txt'
]

Upvotes: 1

TruBlu
TruBlu

Reputation: 523

Regex explanation:

  • ^ start of the string
  • $ end of the string
  • \d matches all numbers. Equivalent to [0-9]
  • + one or many of the expressions
import re

files = ["TYBN-220422-257172171.txt", "TYBN-120522-257172174.txt"]

pattern = re.compile("^TYBN-\d+-\d+\.txt$")

for f in files:
    if pattern.match(f):
        print(f + " matched naming convention.")

Upvotes: 3

Sharim09
Sharim09

Reputation: 6224

Try this one.

lst = ["TYBN-220422-257172171.txt",  "TYBN-120522-257172174.txt", "TYBN-320422-657172171.txt", "TYBN-220622-237172174.txt", "TYBN-FRTRE-FFF.txt"]

valid_format = ['TYBN',True,True] # here true for digits
valid = []

for a in lst:
    l = a.replace('.txt','').split('-')
    if l[0] == valid_format[0]:
        if [i.isdigit() for i in l[1:]] == valid_format[1:]:
                valid.append(a)

print(valid)

OUTPUT:

['TYBN-220422-257172171.txt',
 'TYBN-120522-257172174.txt',
 'TYBN-320422-657172171.txt',
 'TYBN-220622-237172174.txt']

Upvotes: 0

Related Questions