Reputation: 21
I want to extract value from below string using regular expression
"a:4:{i:0;s:24:\"hello \"tejo krishna\"!!!`\";i:1;s:11:\"hello \"xyz\"\";i:2;s:6:\"defeat\";i:3;s:7:\"pattern\";}"
above string I want to extract italic format text. any help appreciated.
Thanks,
Upvotes: 1
Views: 42
Reputation: 4576
The exact constraints of the acceptable characters are not clear, also you don't tell about the language. But in Python, with your example, the regex below works. If you expect more types of characters in the input, just extend the classes:
import re
myre = re.compile(r'\\"([\sa-zA-z0-9]+\\?"?[\sa-zA-z0-9]+\\?"?[!`]*)\\"')
s = r'"a:4:{i:0;s:24:\"hello \"tejo krishna\"!!!`\";'\
r'i:1;s:11:\"hello \"xyz\"\";i:2;s:6:\"defeat\";i:3;'\
r's:7:\"pattern\";}"'
match = myre.findall(s)
# results
# ['hello \\"tejo krishna\\"!!!`', 'hello \\"xyz\\"',
# 'defeat', 'pattern']
Note: in Python, the backslash (\
) is an escape character, so need to be escaped in strings, thus the double backslashes in the output. In regex, backslash is also an escape character, thus the double backslashes in the regex. There because it is defined as raw string (note the r in front of the string r'...'
), Python does not need us to escape, we escape for the regex engine. Otherwise you could use 4 backslashes in normal string: '\\\\"([\\sa-zA-z0-9]+\\\\?"?[\\sa-zA-z0-9]+\\\\?"?[!
]*)\\"'`. You need to do this if in your programming language no raw string is available.
Upvotes: 1