Reputation: 39
I have a RichText and I store its Html source from the QTextEdit in a string. What I'd like to do is extract all the lines one-by-one (I have 4-6 lines). The string looks like this:
//html opening stuff
<p style = attributes...><span style = attributes...>My Text</span></p>
//more lines like this
//html closing stuff
So I need the WHOLE LINES from the opening p tag to the closing p tag (including the p tags too). I checked and tried everything I found around here and on other sites, but still no result.
Here's my code ("htmlStyle" is the input string):
QStringList list;
QRegExp rx("(<p[^>]*>.*?</p>)");
int pos = 0;
while ((pos = rx.indexIn(htmlStyle, pos)) != -1) {
list << rx.cap(1);
pos += rx.matchedLength();
}
Or is there any other way to do this without regex?
Upvotes: 0
Views: 929
Reputation: 39
For those who need the full Qt solution, I figured it out based on @Aditya Poorna 's answer. Thanks for that tip!
Here's the code:
int startIndex = htmlStyle.indexOf("<p");
int endIndex = htmlStyle.indexOf("</p>");
while (startIndex >= 0) {
endIndex = endIndex + 4;
QStringRef subString(&htmlStyle, startIndex, endIndex-startIndex);
qDebug() << subString;
startIndex = htmlStyle.indexOf("<p", startIndex + 1);
endIndex = htmlStyle.indexOf("</p>", endIndex + 1);
}
"QStringRef subString" goes in "htmlStyle" from "startIndex" until the length of "endIndex-startIndex"!
Upvotes: 0
Reputation: 2415
below is pure java way, hope this helps:
int startIndex = htmlStyle.indexOf("<p>");
int endIndex = htmlStyle.indexOf("</p>");
while (startIndex >= 0) {
endIndex = endIndex + 4;// to include </p> in the substring
System.out.println(htmlStyle.substring(startIndex, endIndex));
startIndex = htmlStyle.indexOf("<p>", startIndex + 1);
endIndex = htmlStyle.indexOf("</p>", endIndex + 1);
}
Upvotes: 1
Reputation: 98495
HTML/XML is not a regular grammar. You cannot parse it with a regex. See e.g. this question. Parsing HTML is not trivial.
You can iterate the paragraphs in a rich text document using QTextDocument
, QTextBlock
, QTextCursor
, etc. All the HTML parsing is taken care of for you. This is exactly the subset of HTML that is supported by QTextEdit
: it uses QTextDocument
as an internal representation. You can get it directly from the widget using QTextEdit::document()
. E.g:
void iterate(QTextEdit * edit) {
auto const & doc = *edit->document();
for (auto block = doc.begin(); block != doc.end(); block.next()) {
// do something with text block e.g. iterate its fragments
for (auto fragment = block.begin(); fragment != block.end(); fragment++) {
// do something with text fragment
}
}
}
Instead of incorrectly parsing HTML by hand you should explore the structure of the QTextDocument
and use it as needed.
Upvotes: 2