Reputation: 39
I have had this issue in multiple applications now and I am wondering if anyone has come up with a more efficient solution than mine. Essentially, my goal is to convert the content within a cell, to an HTML string to include all of its formatting. My workaround up to this point has been to loop through each character in the string to determine the font size, weight, and style, however, this can prove to be extremely slow when converting a lot of data at once.
Upvotes: 2
Views: 1618
Reputation: 27516
Going through each character in turn will be very slow, but should only be necessary in extreme cases. I've tackled this same problem quite successfully using the following method.
For each relevant property (bold, italic, etc.) I build up an array that stores the position of each change in the value of that property. Then when generating the HTML, I can spit out all the text up until the next change (in any property). Where changes are infrequent, this is clearly faster.
Now, to arrive at the position of the changes in each property, I first test whether there are in fact any changes, and this is easy - for example, Font.Bold will return true if all the text is bold, false if it's all non bold, and null (or some other value - I can't remember) if there are both bold and non-bold parts.
So, if there's no change in the value at all, we're done already. If there is a change in the value, then I do a binary sub-division of the text into two halves and start again. Again, I might find that one half is all the same, and the other half contains a change, so I do another sub-division of the second half as before, and so on.
Since very few cells tend to have lots of changes, and many have none at all, this ends up being quite efficient. Or at least much more efficient than the character by character method.
Upvotes: 1