Matoe
Matoe

Reputation: 2758

Finding match between optional tokens?

For the strings:

I have the current regex:

m/(handle:|chat_identifier:)(.+?)(:{2}|&)/

And I am currently using $2 in order to obtain the value I wish (in the first string [email protected] and in the second, chat0123456789).

Is there a better/faster/simpler way to solve this problem, though?

Upvotes: 4

Views: 122

Answers (4)

Kenosis
Kenosis

Reputation: 6204

If the values you want are always in the same position and it's safe to split on : and &, then perhaps the following will work for you:

use Modern::Perl;

say +( split /[:&]+/ )[2] for <DATA>;

__DATA__
text::handle:[email protected]::text
text::chat_identifier:chat0123456789&text

Output:

[email protected]
chat0123456789

Upvotes: 1

Francis Gagnon
Francis Gagnon

Reputation: 3675

Looks like you have allot of good solutions already here. The split method seems like the simplest. But depending on your requirements you could also use a more generic regex that breaks the string in its basic pieces. It will work for other datatypes and property names than in your examples.

 ([^:]+)::([^:]+):([^:&]+)(?:::|&)\1

The captures groups are as follows:

  • Group 1: the datatype. (the keyword "text" from your examples.)
  • Group 2: The property name. (The keywords "handle" and "chat_identifier" from your examples.)
  • Group 3: The property value.

Upvotes: 1

Martin Ender
Martin Ender

Reputation: 44279

For a regex solution, this one is slightly simpler and doesn't need to backtrack:

m/(handle|chat_identifier):([^:&]+)/

Note the slight difference: yours allows single colons within the value, mine doesn't (it stops at the first colon encountered). If that is not a problem, you can use my variant. Or as I mentioned in a comment, split at : and use the fourth element in the result.

An equivalent version that does only stop at double colons is this:

m/(handle|chat_identifier):((?:(?!::|&).)+)/

Not so beautiful, but it still avoids backtracking (the lookahead might make it slower, though... you will need to profile that, if speed matters at all).

Upvotes: 2

gcbenison
gcbenison

Reputation: 11963

Whether it's "better" or not depends on the context, but you could take this approach: split the string on ":" and take the fourth element of the resulting list. That's arguably more readable than the regex and more robust if the third field can be something other than "handle" or "chat_identifier".

I think the speed would be very similar for either approach but probably for almost any implementation in perl. I'd want to show that speed was critical for this step before worrying about it...

Upvotes: 4

Related Questions