Jhonathan Asimov
Jhonathan Asimov

Reputation: 69

Regex matching with negative look-behind assertion

I am parsing C source files. I want to match all the variables (in snake-case format) that end in _VALUE and don't begin with CANA_, CANB_... ,CANF_. I need to match the whole variable name for later substitution.

This is my current setup with python

import re

def signal_ending_VALUE_updater(match: re.Match) -> str:
    groups = match.groupdict()
    return some_operation_on(group["SIGNAL_NAME"])

REGEX=r"(?<!CAN[A-F]_)\b(?P<SIGNAL_NAME>\w+_VALUE)\b"

with open(file_path,'r') as f:
   content = f.read()
   content_new = re.sub(REGEX,signal_ending_VALUE_updater,content)

Unfortunately this regex doesn't work all the times, for example if we try this testacase

test="        shared->option.mem = ((canAGetScuHmiVehReqLiftModBtnSt() == CANA_SCU_HMI_VEH_REQ_LIFT_MOD_BTN_ST_PRESSED_VALUE) ||"
re.find(REGEX,test)

Will return the variable (CANA_SCU_HMI...) that I don't want to match. What am I not considering in the regex?

The idea behind the regex is:

Upvotes: 1

Views: 54

Answers (1)

The fourth bird
The fourth bird

Reputation: 163577

This part of your regex (?<!CAN[A-F]_)\b asserts that this pattern CAN[A-F]_ does not occur directly to the left of the current position followed by a word boundary.

You get a match for this text CANA_SCU_HMI_VEH_REQ_LIFT_MOD_BTN_ST_PRESSED_VALUE because at the beginning of that text, that assertion is true.

What you can do instead is start with a word boundary, and then assert that what is directly to the right does not match the pattern CAN[A-F]_

\b(?!CAN[A-F]_)(?P<SIGNAL_NAME>\w+_VALUE)\b

See a regex 101 demo

Upvotes: 2

Related Questions