Reputation: 199
Essentially i have a txt document with this in it,
The sound of a horse at a gallop came fast and furiously up the hill.
"So-ho!" the guard sang out, as loud as he could roar.
"Yo there! Stand! I shall fire!"
The pace was suddenly checked, and, with much splashing and floundering, a man's voice called from the mist, "Is that the Dover mail?"
"Never you mind what it is!" the guard retorted. "What are you?"
"_Is_ that the Dover mail?"
"Why do you want to know?"
"I want a passenger, if it is."
"What passenger?"
"Mr. Jarvis Lorry."
Our booked passenger showed in a moment that it was his name.
The guard, the coachman, and the two other passengers eyed him distrustfully.
Using regex i need to print everything within double quotes, I dont want the full code i just need to know how i should go about doing it, which regex would be most useful. Tips and pointers please!
Upvotes: 2
Views: 82
Reputation: 12002
This should do it (explanation below):
from __future__ import print_function
import re
txt = """The sound of a horse at a gallop came fast and furiously up the hill.
"So-ho!" the guard sang out, as loud as he could roar.
"Yo there! Stand! I shall fire!"
The pace was suddenly checked, and, with much splashing and floundering,
a man's voice called from the mist, "Is that the Dover mail?"
"Never you mind what it is!" the guard retorted. "What are you?"
"_Is_ that the Dover mail?"
"Why do you want to know?"
"I want a passenger, if it is."
"What passenger?"
"Mr. Jarvis Lorry."
Our booked passenger showed in a moment that it was his name.
The guard, the coachman, and the two other passengers eyed him distrustfully.
"""
strings = re.findall(r'"(.*?)"', txt)
for s in strings:
print(s)
Result:
So-ho!
Yo there! Stand! I shall fire!
Is that the Dover mail?
Never you mind what it is!
What are you?
_Is_ that the Dover mail?
Why do you want to know?
I want a passenger, if it is.
What passenger?
Mr. Jarvis Lorry.
r'"(.*?)"'
will match every string within double quotes. The parentheses indicate a capture group, so you'll only get the text without the double-quotes. The .
matches every character (except for a newline), and the *
means "zero or more of the last thing", the last thing being the .
. The ?
after the *
makes the *
"non-greedy", which means it matches as little as possible. If you didn't use the ?
, you'd only get one result; a string containing everything between the first and last double-quote.
You can include the re.DOTALL flag so that .
will also match newline characters, if you want to extract strings that cross lines. If you want to do that, use re.findall(r'"(.*?)"', txt, re.DOTALL)
. The newline will be included in the string, so you'd have to check for that.
Explanation unavoidably similar to / based on @TigerhawkT3's answer. Vote that answer up, too!
Upvotes: 0
Reputation: 49320
r'(".*?")'
will match every string within double quotes. The parentheses indicate a captured group, the .
matches every character (except for a newline), the *
indicates repetition, and the ?
makes it non-greedy (stops matching right before the next double-quote). If you want, include the re.DOTALL
option to make .
also match newline characters.
Upvotes: 3