Konrad
Konrad

Reputation: 18657

Horror of versioning MS Word files

Faced with an unfortunate need to version MS Word documents I have implemented the following configuration

~/.gitconfig

# Help MS Word document versioning
[diff "pandoc"]
    textconv=pandoc --to=markdown
    prompt = false

./repo/.gitattributes

# Version control MS Word
*.docx diff=pandoc
*.docm diff=pandoc

Problem

When I try to run git diff Big-Problematic-Document.docm

19:17 $ git diff Big-Problematic-Document.docm
UTF-8 decoding error in /var/folders/7x/kwc1y_l96t55_rwlv35mg8xh0000gn/T//uPSuEc_Big-Problematic-Document.docm at byte offset 22 (c1).
The input must be a UTF-8 encoded text.
fatal: unable to read files to diff

Diagnosis

Question

Is there a way to further develop ~/.gitconfig so the pandoc conversion will remove non-UTF-8 text?

Upvotes: 0

Views: 305

Answers (1)

Cristiano
Cristiano

Reputation: 11

seems that pandoc does not know how to treat .docm (Word w/ VB macros). Give him an help and add the --read=docx explicit hint that the input is in fact a Word doc(x).

Probably you want to add that to the docm line in your .gitattributes.

That cures it for pandoc 2.9.1.1

Upvotes: 1

Related Questions