Reputation: 23
I have this sentence: "int open(const char *" pathname ", int " flags );
I am trying to find a regex to extract the words outside the double quotes. Example: "pathname" and "flags". I created a regex expression, but it only catches the word "flags" and not the word "pathname". Here is what I have:
reg2 = r"""(\".*\" (.*) )+\);"""
pattern2 = re.compile(reg2)
inner = m.group(1)
m2 = pattern2.search(inner)
EntityI = m2.group(2)
print EntityI
Note: m.group(1) is: "int open(const char *" pathname ", int " flags );
Thanks for the help!
Edit: Just the clarify some more. Another possible case could be:
"int open(const char *" pathname ", int " flags ", mode_t " mode );
And I would want to extract the words: "pathname", "flags", and "mode".
Upvotes: 1
Views: 903
Reputation: 18950
This is a perfect case for the trash-can-appraoch: forget everything that is not in capture group 1.
".*?"|(\w+)
Explanation: We select from two alternatives |
".?"
matches a string from start to end using the quotes as an anchor and anything in-between using the .
and the *
quantifier that any number of repetitions. The ?
changes the behavior of the star to match as few times as possible (lazy) to avoid to match too much with a default greedy match.(\w+)
the parenthesis define a capture group that captures one or more +
alphanumerics: \w
itself is a shorthand character class that stands for [a-zA-Z0-9_]
(this is called a character range).Sample code:
import re
regex = r'".*?"|(\w+)'
test_str = "\"int open(const char *\" pathname \", int \" flags );"
matches = re.finditer(regex, test_str, re.MULTILINE)
for match in matches:
if match.group(1):
print ("Found at {start}-{end}: {group}".format(start = match.start(1), end = match.end(1), group = match.group(1)))
Output:
Found at 24-32: pathname
Found at 42-47: flags
Upvotes: 2
Reputation: 15072
Here's one way that replaces things inside quotes and then splits the resulting string. You'll probably want to do more processing since as noted the );
is also outside the quotes.
import re
my_string = '"int open(const char *" pathname ", int " flags );'
re.sub('".*?"', '_', my_string).split('_')[1:]
## [' pathname ', ' flags );']
Upvotes: 0