Reputation: 1137
I am using openrefine to do some cleaning in my data-set. I am trying to apply a regex to a column in my dataset. That regex is returning multiple matching groups. I want to save those groups into different(respective) new columns. I can apply regex like this Edit column>Add column based on column
. After selecting Python / Jython
from the Language I am putting my Expression
as shown below:
import re
regex = r"custom_regex"
value = re.findall(regex, value)
# Check if anything matched with the regex and if so return the first match:
if len(value)>0:
return value[0]
# In order to get the groups: return value[0][0], or value[0][1], or value[0][2] etc.
# If there is no match, return value (empty list)
else:
value = "No Match" #If you want it to return a message instead of empty list
return value
But with this method, I can create only one column at a time. Is there a way to create columns as much as the regex matching groups?
Upvotes: 0
Views: 567
Reputation: 2830
You cannot directly create more than one new column with OpenRefine. However, you can simplify your script by using Grel instead of Python:
if(value.find(/YOUR REGEX/) > 0, value.find(/YOUR REGEX/).join(|), "No match")
The .find()
method in Grel (OpenRefine version >= 3) is pretty similar to re.findall()
in Python.
Store the result in a new column, then use "Edit column/split into several columns" with a pipe (|) as separator to produce as many new columns as you have groups.
The Jython equivalent is probably something like this:
value = "1995 is a year"
Code
import re
regex = r"(\d+).+?(year)"
match = re.findall(regex, value)
if match:
return "|".join(value[0])
else:
return "No Match"
Result
1995|year
Upvotes: 1