lapinkoira
lapinkoira

Reputation: 8998

regex with tabs \t

I am trying to apply a named regex into a string which has its fields splitted by tabs but cant figure out how.

test = "2018-06-16T07:03:23.813056Z\thello\tworld"
Regex.named_captures(~r/(?<foo>)\\t(?<hello>)\\t(?<world>)/, test)
nil

Tab is a special character in the doc http://erlang.org/documentation/doc-5.7.4/lib/stdlib-1.16.4/doc/html/regexp.html so I am not sure if can be done like that

Upvotes: 0

Views: 5224

Answers (1)

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 627126

In your pattern, you defined named groups that do not capture any chars, just empty strings. (?<foo>) matches and captures an empty string. Now, \\t in your pattern matches a literal backslash followed with the letter t. So, ~r/(?<foo>\\t)/ will find a match in a ~S(2018-06-16T07:03:23.813056Z\thello) (a 2018-06-16T07:03:23.813056Z\thello string that has no tab in it).

Besides, if you build a regex using a constructor from a string literal, you may define a tab in two ways, both as "\t" and "\\t". The former one will be passed to the regex engine as a literal TAB char while in the latter case the regex escape consisting of a literal \ and t will be passed matching the same TAB chars. Regex.compile!("(?<foo>\t)") = Regex.compile!("(?<foo>\\t)").

You can actually match the non-tab chunks, perhaps, with [^\t]*:

~r/(?<foo>[^\t]*)\t(?<hello>[^\t]*)\t(?<world>[^\t]*)/
          ^^^^^^            ^^^^^^            ^^^^^^

If you want to match that pattern as whole string only, enclose with ^ and $:

~r/^(?<foo>[^\t]*)\t(?<hello>[^\t]*)\t(?<world>[^\t]*)$/
   ^                                                  ^

The [^\t]* is a negated character class matching any char but a tab, 0 or more times (* is a greedy quantifier that matches 0 or more consecutive occurrences of the quantified subpattern).

See the online demo.

Upvotes: 1

Related Questions