Ndivho
Ndivho

Reputation: 31

Regex, extract date from a file name

Sample text is:

Statement_10125229_20170807.pdf

I would like to get the date, 20170807.

I was able to extract the statement ID using (?<=_).*?(?=_) = 10125229. Now I would like to extract the date, I have tried (?<=_)\d* but I am still getting back also the Statement ID.

Upvotes: 2

Views: 418

Answers (1)

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 626747

In general, (?<=_)\d+(?=\.) and (?<=_)[^_]*(?=\.pdf) would solve your issue. The (?<=_)\d+(?=\.) pattern matches one or more digits that are immediately preceded with a _ and immediately followed with .. The (?<=_)[^_]*(?=\.pdf) pattern matches any zero or more chars other than _ that are immediately preceded with a _ and immediately followed with .pdf.

However, in C#, you can actually get the substring you need without a regex. You can use

var text = "Statement_10125229_20170807.pdf";
var result = Path.GetFileNameWithoutExtension(text).Split('_').LastOrDefault();

With a regex, you can also go for a capturing approach:

var result = Regex.Match(text, @"_(\d+)\.pdf$")?.Groups[1].Value;

See the C# demo online, both approaches yield 20170807.

Upvotes: 1

Related Questions