Reputation: 144
I was able to use this question as a starting point in parsing an "mht" file but the "3D" in the anchor tags (e.g.: <a href=3D"[my anchor]">[anchor text]></a>
) breaks all the internal links and embedded images. I can have the parser replace "=3D" with just "=" (e.g.: <a href="[my anchor]">[anchor text]></a>
) and it appears to work fine but I want to understand the purpose of that "meta markup".
Why does exporting from ".docx" to ".mht" add "3D" to the right-hand sides of most (if not all) of the html attributes? Is there a better way to handle them or a better regex to use when replacing them?
Upvotes: 3
Views: 1562
Reputation: 36061
The =3D
is a result of quoted printable encoding. It shouldn't be too hard to find a java library for decoding quoted printable data.
Upvotes: 3