Reputation: 9672
I am trying to grab filenames from a list of endings that looks like this:
final count: {'.pem': 5027, '__base__': 434, '.rb': 62341, '/AUTHORS': 1358, '.sty': 859, '.gitignore': 193,...}
My regex looks as follows:
p = re.compile(r"'([\W]+)(.*?)'")
It works ok except on '__base__'
, where I get '__base__'
instead of the 'base' I want due to underscores being a word-like character. I tried:
p = re.compile(r"'([\W]+|\_+)(.*?)'")
p = re.compile(r"'([\W]+|_+)(.*?)'")
and
p = re.compile(r"'([\W]+)|(_+)(.*?)'")
but none worked. What is the proper way to do this? Thank you
Upvotes: 0
Views: 463
Reputation: 760
Try adding in the carat to make an exception to your regex
p = re.compile(r"'([\W^_]+)(.*?)'")
When ^
is outside of a matching group (the square brackets) it means at the beginning of a string or beginning of a new line. When it is inside the matching group, it means "negates" or "not".
Upvotes: 2
Reputation: 5696
You can use this:
re.findall(r"([a-zA-Z0-9]+)_{0,2}':", my_str)
It will capture only consecutive letters and numbers before 0 to 2 _
, and ':
, since you only need the string before ':
.
Explanation:
{0,2}
matches 0 to 2 of the previous.
[a-zA-Z0-9]+
is used instead of \w+
since the latter would match _
as well.
Upvotes: 1