quantum231
quantum231

Reputation: 2593

What is regex to match full path to file that has alphanumeric name ending with digits only?

I have a collection of strings. They are full absolute paths to files. From this list, I want to select those paths that contain filename that ends with underscore followed by three digits, followed by extension name.

The filename shall have to start with a letter and not a digit or underscore, it can then contain any number of letters and numbers and underscores. At end it must have underscore followed by three digits before the extension.

I tried to use \w, but this already contains digits so will match with even those files that do not end with digits. I only got to these two but they are not good enough:

\w(_[0-9]{3})\.(vhd|vhdl|sv|v)$ 
(_[0-9]{3})\.(vhd|vhdl|sv|v)$

Here are a few examples:

These will fail match:

./Qsys_Systems/MY_SYS_TB/simulation/submodules/1_MY_SYS_TB_mm_interconnect_0_router.sv
./Qsys_Systems/MY_SYS_TB/simulation/submodules/_MY_SYS_TB_mm_interconnect_0_router.sv
./Qsys_Systems/MY_SYS_TB/simulation/submodules/MY_SYS_TB_mm_interconnect_0_router.sv

These will pass match:

./Qsys_Systems/MY_SYS_TB/simulation/submodules/MY_SYS_TB_mm_interconnect_0_router_001.sv
./Qsys_Systems/MY_SYS_TB/simulation/submodules/MY_SYS_TB_mm_interconnect_0_router_002.sv
./Qsys_Systems/MY_SYS_TB/simulation/submodules/MY_SYS_TB_mm_interconnect_0_router_004.sv

Upvotes: 1

Views: 226

Answers (1)

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 627468

According to your description, you can use

^(?:.*/)?[a-zA-Z]\w*_\d{3}\.(?:vhd|vhdl|sv|v)$

See this regex demo.

Details:

  • ^ - string start
  • (?:.*/)? - an optional sequence of any text (any zero or more chars other than line break chars as many as possible) and then a / char
  • [a-zA-Z] - a letter
  • \w* - zero or more letters/digits/_
  • _ - underscore
  • \d{3} - three digits
  • \. - a .
  • (?:vhd|vhdl|sv|v) - one of the extensions
  • $ - end of string.

If your regex flavor is POSIX ERE, use capturing groups and [0-9] to match digits: ^(.*/)?[[:alpha:]][_[:alnum:]]*_[0-9]{3}\.(vhd|vhdl|sv|v)$.

Provided the regex flavor supports Unicode property classes, you may replace [a-zA-Z] with \p{L} that matches any Unicode letter, or with a [[:alpha:]] POSIX character class.

See this regex demo.

Upvotes: 2

Related Questions