Capturing content from a string

Question

I am attempting to parse some logs to get the specific catalog numbers for the items viewed. I have broken out all the necessary fields and am now parsing the referer field to get the catalog id of the page viewed.

The strings are in the following formats:

   /catalog/AAA1111111
   /catalog/BBB-22222-1/
   /catalog/CCC-333333/XXX
   http://url/catalog/DDD-44444444
   http://url/catalog/EEE-555555555/ZZZ

I am using the following regex to strip out the catalog id:

   .*\/catalog\/([^\/]+)

The problem is that I cannot stop the regex from grabbing everything after the next forward slash. It looks like it is to greedy?

The results are:

   AAA1111111
   BBB-22222-1/
   CCC-333333/XXX
   DDD-44444444
   http:EEE-555555555/ZZZ

I've been banging my head on this one for a couple of hours.

I am just looking for a regex that will split out just the catalog id (the string after catalog/.)

Can anyone help guide this old coder in the proper direction?

Many thanks.

waTeim · Accepted Answer

using sed

cat catalogs  | sed -E 's/.*\/catalog\/([^/]+)\/?.*/\1/g'

results in

AAA1111111
BBB-22222-1
CCC-333333
DDD-44444444
EEE-555555555

note the only modification is matching the trailing stuff

Capturing content from a string

Answers (2)

Related Questions