Shaun
Shaun

Reputation: 23

Tableau Regular Expression Regexp_Extract() Troubles

I am trying to use the REGEXP_EXTRACT(string, pattern) function in Tableau 10.0. I have worked with regex before in C, R and python. Tableau just doesn't seem to be cooperating for even the simplest regex, all I can get is "Null". My data source is an Excel spreadsheet (.xlsx). I also tried using a CSV file with the same data to no avail.

Examples: Players[1] = "Ezekiel Elliott, RB" (r.e the Players dimension contains values of the format [A-z]{1,40},\s[A-Z]{2}). I am trying to capture the last two letters, which contain the player's football position (RB, QB, etc.) in a calculated field. All following simply return null:

REGEXP_EXTRACT([Players], '(?<=,\s)[A-Z]{2}')

REGEXP_EXTRACT([Players], '(?<=,\s)[A-Z]{2}')

REGEXP_EXTRACT[[Players], "[e,E]")

REGEXP_EXTRACT("Ezekiel Elliott, RB", "[e,E]")

The last two are especially troubling, as I can't even match a letter.

Any solutions, explanations of what I'm doing wrong, or even an arrow pointing in an illuminative direction would be greatly appreciated. I've consulted the reference on ICU regular expressions, http://userguide.icu-project.org/strings/regexp, but with no luck.

Upvotes: 2

Views: 4236

Answers (1)

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 627317

You need to capture what you need to extract:

REGEXP_EXTRACT[[Players], "([A-Z]{2})$")

The ([A-Z]{2})$ will match the following:

  • ([A-Z]{2}) - matches and captures into Group 1 (what you will get in the end) exactly two ASCII uppercase letters
  • $ - asserts the end of string position.

If you say the entries can be matched with ^[A-Za-z]{1,40},\s([A-Z]{2})$, you may also use this pattern (here, ^ matches the start of string, [A-Za-z]{1,40} will match 1 to 40 ASCII letters, , will match a comma, and the rest of the pattern is the same as above). Note that [A-z] matches more than just letters.

Upvotes: 3

Related Questions