TwelveFifths
TwelveFifths

Reputation: 25

Bold math characters into moodle with mathml converter

When I have $$\mathbf{x}$$ in my .Rmd file, and use exams2moodle with the pandoc-mathml converter, the xml file contains an "𝐱" character, which needs to be replaced with an "x" character before moodle will import the quiz question (because moodle will give an error saying the file is not UTF-8 without BOM.)

I wonder what are the most practical workarounds? Is this a bug? Thanks!

Minimal example: Here is minimal_example.Rmd

Question
========

Stare hard at the variable.
$$\mathbf{x}$$
What is its value?


Solution
========

If you think hard enough, you will know it is 12.

Meta-information
================
extype: num
exsolution: 12
exname: minimal_example
extol: 0

Here is the minimal_example.r

library("exams")
exams2moodle("minimal_example.Rmd", converter="pandoc-mathml")

And... here is a snippet of the resulting .xml file.

...
<questiontext format="html">
<text><![CDATA[<p>
<p>Stare hard at the variable. <math display="block" xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mstyle mathvariant="bold"><mi>𝐱</mi></mstyle><annotation encoding="application/x-tex">\mathbf{x}</annotation></semantics></math> What is its value?</p>
</p>]]></text>
</questiontext>
...

If I try importing the XML to my school's moodle, I get a dmlwriteexeption error. If I replace the "𝐱" with "x" the XML imports fine.

I am fairly certain my moodlequiz.xml file does not contain a BOM.

$ file moodlequiz.xml 
moodlequiz.xml: XML 1.0 document, UTF-8 Unicode text, with very long lines
$ hexdump -n 3 -C moodlequiz.xml 
00000000  3c 3f 78                                          |<?x|
00000003

I consider this question resolved. Hopefully nobody else has this issue, and I will use one of the proposed workarounds for my own files. Thanks!

Upvotes: 1

Views: 228

Answers (1)

Achim Zeileis
Achim Zeileis

Reputation: 17183

TL;DR

exams2moodle(..., converter = "pandoc-mathml") seems to work correctly and produces an UTF-8 encoded XML file moodlequiz.xml. The problem on your end appears to be caused by a BOM (byte order mark) in your XML file. It is unclear to me whether this is introduced through exams2moodle() or through an editor on your end.

Either you can remove the BOM manually or you can avoid the UTF-8 encoding altogether by using exams2moodle(..., converter = "pandoc-mathml-ascii"). The latter requires at least version 2.4-0 of the package.

Replication

Thanks for providing a reproducible example. I ran your example code - both on a Linux machine running in an UTF-8 locale and a Windows 10 machine - and can confirm that I get exactly the same XML code containing the UTF-8 encoded bold x: 𝐱. However, I have no problem importing that into my Moodle system.

Possible sources of the problem

So I looked up what the Moodle error message is about. Moodle does not accept UTF-8-encoded files with a BOM (byte order mark) at the beginning. Some systems use a BOM at the beginning of a file to declare how the file is encoded. See:

The moodlequiz.xml I produced on the two systems I mentioned above have no BOM. So I suspect that either your R setup produces a file with a BOM or the BOM is inserted later, e.g., after opening the XML file with an editor. The Moodle documentation above has some information on what you can do to detect the BOM and get rid of it. Hopefully, this lets you debug the problem on your end. If the BOM was produced by exams2moodle() (as opposed to your editor for example) and you find out how to avoid that, please let me know.

Alternative solution

In principle it is possible to replace the UTF-8 encocded characters by the corresponding HTML entities. For example, in this particular case we have a "MATHEMATICAL BOLD SMALL X" with Unicode U+1D431 (see https://www.w3.org/Math/characters/bold.html). Thus, we can also represent it as &#x1D431; (hexadecimal) or &#119857; (decimal). Then the XML file can be in ASCII while still leading to the same output in HTML.

While pandoc is generally designed to work with UTF-8 throughout it also has support for (hexa)decimal escapes in certain conversions, see https://pandoc.org/MANUAL.html#option--ascii. And luckily it is possible to combine the --mathml with the --ascii option. There was only a small bug in how R/exams passed on the option to the rmarkdown::pandoc_convert() function which I just fixed. So you need at least version 2.4-0 of exams and can then do:

exams2moodle(..., converter = "pandoc-mathml-ascii")

which yields a moodlequiz.xml in ASCII instead of UTF-8.

Upvotes: 1

Related Questions