Ram
Ram

Reputation: 731

Regular expression for text extraction

Can you please help me with the regular expression. I am newbie to this.

my requirement is I want to extract the vehicle no (i.e, 123456789) from the below url :

mysite.com/resource?slk=121&ops=rewww&from=kld&to=aop&search=things&validVehicle=sdfdsdff-sdfdf-sddf%3AVX%3ALNCX%3A123456789%3AOPW%3ALOS

I tried the below expression:

[&?]{1}validVehicle[=]{1}[^&]*[%3A]{1}([^%&]+)

But it is giving invalid results. Can you pelase help me on this.

Upvotes: 0

Views: 58

Answers (2)

Scott Weaver
Scott Weaver

Reputation: 7361

A "structural" approach might be to use those "%3a" colons as the delimiters of the pattern, combined with non-greedy wildcards .* (this matches fourth field of 'validVehicle' as defined by the delimiter %3a, and assumes this structure does not change):

[&?]validVehicle=(?:.*?%3a){3}(.*?)%3a

The utility of this way vs the \d{9} patterns already suggested really just depends on what you know for certain about the incoming data. Such patterns would certainly match nine digits in other fields of that delimited value.

Upvotes: 0

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 627400

A pure regex solution:

[&?]validVehicle=[^&]*(\d{9})

Or, if you are sure they appear after %3A and not followed with a digit:

[&?]validVehicle=[^&]*%3A(\d{9})(?!\d)

See this regex demo and another regex demo. The value you seek is in Group 1.

Details:

  • [&?] - a ? or &
  • validVehicle= - a literal substring
  • [^&]* - any symbols other than &, as many as possible up to the last
  • %3A - literal substring
  • (\d{9}) - Group 1: 9 digits
  • (?!\d) - not followed with a digit.

Upvotes: 1

Related Questions