the_flash
the_flash

Reputation: 59

Regex - how to exclude everything after a capture

I have some IIS logs in which I'm looking to extract the file path and file name from the cs_uri_stem field. An example IIS event is as follows:

2018-02-21 04:39:13 <IPv4> GET /www/images/flash_email_large.gif - 8030 - <IPv4> Mozilla/4.0+(compatible;+MSIE+7.0;+Windows+NT+6.3;+WOW64;+Trident/7.0;+.NET4.0E;+.NET4.0C;+.NET+CLR+3.5.30729;+.NET+CLR+2.0.50727;+.NET+CLR+3.0.30729;+Microsoft+Outlook+16.0.4654;+ms-office;+MSOffice+16) 200 0 0 531

My regex is as follows:

.*?(GET|POST|HEAD|OPTIONS|PROPFIND)\s(?P<file_path>(?:[^\/]*\/)*)(?P<file_name>.*)\s-

but I'm getting extra characters after the file name (in this case, flash_email_large.gif). How can I exclude everything after the file name in my regex?

Thx

Upvotes: 1

Views: 35

Answers (1)

anubhava
anubhava

Reputation: 784958

You can use this better performing regex to capture file path and file name in 2 capturing groups:

\s(GET|POST|HEAD|OPTIONS|PROPFIND)\s(?P<file_path>\S*\/)(?P<file_name>\S+)\s-

RegEx Demo

Changes:

  1. Replace starting .*? with \s
  2. Avoid nested quantifier expression (?:[^\/]*\/)*
  3. Replace last .* with \S+

Upvotes: 1

Related Questions