Lookahead for URL and User Agent in rather complex Logfile

Question

I have this Regex: http://regexr.com/39rbe

1413323829.0907|172.168.1.0|  |somedomain.com|OK|0015e248f2484591f52ed37030001|st=bla&cp=huh%2Cs_de%2Cf_bt%2Ce_rc%2Ch_sub%2Cl_ol%2Ca_noapp%2Cp_npaid%2Ci_t-e&sv=i2&pt=CP&rf=www.google.de&r2=https%3A%2F%2Fwww.google.de%2F&ur=mydomain.de&xy=1366x768x24&lo=DE%asdaasdasdcb=0009&vr=306&id=guccjs<=1413373830843&ev=&cs=w2dwmo&mo=1&la=1413773766|i00=0615e248f8484591f52ed47030001%3B543e5f46%3B55966cde|Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/527.36 (KHTML, like Gecko) Chrome/37.0.2162.124 Safari/527.36|http://mydomain.de/uriPath|023|web|OK|OK

I am trying to capture the User Agent string where the URL equals http://mydomain.de/uriPath, e.g. does not work yet:

[^\|]+(?=https?:\/\/(?:www\.)?mydomain\.de[^\|]+)

nu11p01n73R · Accepted Answer

What about

\|[^|]+\|(?=https?:\/\/(?:www\.)?mydomain\.de[^\|]+)

The for example: http://regex101.com/r/tF4jD3/5

If you dont want the starting and trailling | , add those to a look around assertions as

(?<=\|)[^|]+(?=\|https?:\/\/(?:www\.)?mydomain\.de[^\|]+)

giving output as

Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/527.36 (KHTML, like Gecko) Chrome/37.0.2162.124 Safari/527.36

What it does?

(?<=\|) asserts that the following regex is presceded by |

[^|]+ matches anything other than |

EDIT

Using capturing groups

\|([^|]+)\|(?:https?:\/\/(?:www\.)?mydomain\.de[^\|]+)

Lookahead for URL and User Agent in rather complex Logfile

Answers (2)

Related Questions