Reputation: 2117
I have a dataset with a column with processes and the path. I am trying to use regex with Impala to strip off the executable. The dataset looks like this:
C:\\Windows\\System32\\svchost.exe
C:\\Windows\\System32\\conhost.exe
C:\\Windows\\System32\\net1.exe
C:\\Windows\\System32\\schtasks.exe
C:\\Program Files (x86)\\Citrix\\ICA Client\\SelfServicePlugin\\SelfService.exe
C:\\Windows\\System32\\backgroundTaskHost.exe
C:\\Windows\\System32\\net.exe
C:\\Windows\\System32\\conhost.exe
C:\\Program Files (x86)\\Wireless AutoSwitch\\wrlssw.exe
Desired output:
svchost.exe
conhost.exe
net1.exe
schtasks.exe
SelfService.exe
backgroundTaskHost.exe
net.exe
conhost.exe
wrlssw.exe
I have tried many queries like the two below but keep running into errors
select regexp_extract(w.destinationprocessname, '([^\\]+)$')
from winworkstations_realtime w
where w.externalid = '4688'
limit 10
Error:
AnalysisException: No matching function with signature: regexp_replace(STRING, STRING).
select regexp_extract(w.destinationprocessname, '\\(?:.(?!\\))+$',0)
from winworkstations_realtime w
where w.externalid = '4688'
limit 10
Error:
Could not compile regexp pattern: \(?:.(?!\))+$ Error: invalid perl operator: (?!
Looking for some guidance from anyone that is good with impala or regex.
Upvotes: 0
Views: 556
Reputation: 7947
Not being a regex expert, I am sure there is a better way but this does the work
select regexp_replace(regexp_extract("C:\\\Windows\\\\System32\\\\svchost.exe", ".+(\\\\.+)$", 1), "\\\\", "");
Upvotes: 1