millisami
millisami

Reputation: 10151

Ruby regex to extract match_group value?

I have two questions about regex.

  1. The match string is:

    "FuR6UcUiduzPyenxCSzZbDXTge3f3t9ufA, [email protected]"
    

    When extracting the user_email value, my regexp is:

    \s+(?<email_from_header>\S+)
    

    and the match group value is:

    (space)[email protected]"
    

    What do I use to omit the first (space) char and the last " quote?

  2. When extracting the token, my regex is:

    AUTH-TOKEN\s+(?<auth_token>\S+)
    

    and the match group value is:

    FuR6UcUiduzPyenxCSzZbDXTge3f3t9ufA,
    

    What do I use to delete that last trailing comma ,?

Upvotes: 0

Views: 354

Answers (2)

the Tin Man
the Tin Man

Reputation: 160553

If your string has embedded double-quotes:

str[/^"(.+),/, 1] # => "FuR6UcUiduzPyenxCSzZbDXTge3f3t9ufA"
str[/^"(.+?),/, 1] # => "FuR6UcUiduzPyenxCSzZbDXTge3f3t9ufA"
str[/^"([^,]+),/, 1] # => "FuR6UcUiduzPyenxCSzZbDXTge3f3t9ufA"

str = '"FuR6UcUiduzPyenxCSzZbDXTge3f3t9ufA, [email protected]"'
str # => "\"FuR6UcUiduzPyenxCSzZbDXTge3f3t9ufA, [email protected]\""

str[/(user_email=.+)"/, 1] # => "[email protected]"
str[/(user_email=[^"]+)"/, 1] # => "[email protected]"
str[/user_email=([^"]+)"/, 1] # => "[email protected]"
match = str.match(/(?<user_email>user_email=(?<addr>.+))"/)
match # => #<MatchData "[email protected]\"" user_email:"[email protected]" addr:"[email protected]">
match['user_email'] # => "[email protected]"
match['addr'] # => "[email protected]"

If it doesn't:

str = 'FuR6UcUiduzPyenxCSzZbDXTge3f3t9ufA, [email protected]'
str # => "FuR6UcUiduzPyenxCSzZbDXTge3f3t9ufA, [email protected]"

str[/^(.+),/, 1] # => "FuR6UcUiduzPyenxCSzZbDXTge3f3t9ufA"
str[/^(.+?),/, 1] # => "FuR6UcUiduzPyenxCSzZbDXTge3f3t9ufA"
str[/^([^,]+),/, 1] # => "FuR6UcUiduzPyenxCSzZbDXTge3f3t9ufA"

str[/(user_email=.+)/, 1] # => "[email protected]"
str[/(user_email=(.+))/, 2] # => "[email protected]"
str[/user_email=(.+)/, 1] # => "[email protected]"

Or, having more regex fun:

match = str.match(/(?<user_email>user_email=(?<addr>.+))/)
match # => #<MatchData "[email protected]" user_email:"[email protected]" addr:"[email protected]">
match['user_email'] # => "[email protected]"
match['addr'] # => "[email protected]"

Regular expressions are a very rich language, and you can write something in many ways usually. The problem then becomes maintaining the pattern as the program "matures". I recommend starting simply, and expanding the pattern as the needs dictate. Don't start complex hoping to find a working solution, because that usually doesn't work; Getting a complex pattern to work immediately often fails.

Upvotes: 2

Avinash Raj
Avinash Raj

Reputation: 174696

  1. Your regex would be,

    \s+\K(?<email_from_header>[^"]*)
    

    Use \K switch to discard the previously matched characters. And use not character class to match any character not of " zero or more times.

  2. Your regex would be,

    AUTH-TOKEN\s+(?<auth_token>[^,]*)
    

    [^,]* it would match any character not of , zero or more times.

Upvotes: 3

Related Questions