Reputation: 821
I'm generating some odt/docx reports via markdown using knitr and pandoc and am now wondering how you'd go about formating tables. Primarily I'm interested in adding rules (at least top, bottom and one below the header, but being able to add arbitrary ones inside the table would be nice too).
Running the following example from the pandoc documentation through pandoc (without any special parameters) just yields a "plain" table without any kind of rules/colours/guides (in either -t odt
or -t docx
).
+---------------+---------------+--------------------+
| Fruit | Price | Advantages |
+===============+===============+====================+
| Bananas | $1.34 | - built-in wrapper |
| | | - bright color |
+---------------+---------------+--------------------+
| Oranges | $2.10 | - cures scurvy |
| | | - tasty |
+---------------+---------------+--------------------+
I've looked through the "styles" for the possibility of specifying table formating in a reference .docx/.odt but found nothing obvious beyond "table header" and "table contents" styles, both of which seem to concern only the formatting of text within the table.
Being rather unfamiliar with WYSIWYG-style document processors I'm lost as to how to continue.
Upvotes: 33
Views: 21804
Reputation: 21
This is possible to do with lua filters since pandoc version 3.4 because of this PR: https://github.com/jgm/pandoc/pull/10009
You need to update the element attribute "custom-style" for the table, and pandoc will correctly set the w:tblStyle
value in the resulting docx xml structure to have a non-default table.
If you are using pandoc markdown as your source, and you are building a docx file, you can't set that element attribute using default pandoc markdown syntax and that is why you'd need to approach this with something like a lua filter.
Here is an example lua filter that turns all tables in your docx output into the custom style of your choice, even ones that you create in a reference doc:
-- table-filter.lua
local tablestyle = "TableCustom"
function Table(elem)
-- https://pandoc.org/lua-filters.html#type-table
if elem.attr.attributes["custom-style"] == nil then
io.stderr:write("Table style set to: " .. tablestyle .. "\n")
elem.attr.attributes["custom-style"] = tablestyle
end
return elem
end
You would use this lua filter by running pandoc with the lua filter, and reference doc that defines a TableCustom table style that you made:
pandoc --reference-doc custom-reference-doc.docx -L table-filter.lua input.md -o output.docx
For me, this is superior to creating an additional stage of generation where you decompress and edit the docx XML structure directly.
Upvotes: 2
Reputation: 11942
Using a reference docx file and then python-docx does the job pretty easily :
https://python-docx.readthedocs.io/
First convert your document to docx :
Bash :
pandoc --standalone --data-dir=/path/to/reference/ --output=/tmp/xxx.docx input_file.md
Notes :
/path/to/reference/
points to the folder containingreference.docx
reference.docx
is a file containing the styles you need for docx elements
Then give the tables of your document the style you want to use :
Python :
import docx
document = docx.Document('/tmp/xxx.docx')
for table in document.tables:
table.style = document.styles['custom_style'] # custom_style must exist in your reference.docx file
document.save("target.docx") # thank you Anish
Upvotes: 4
Reputation: 1
add filter and custom your own Table style, see lua filter: https://github.com/ZhouJunjun/TyporaLuaFilter
Upvotes: -1
Reputation: 23064
I really liked gbjbaanb's answer - here's a powershell version:
Background: Set up a PanDoc --reference-doc
template as described in the pandoc documentation for the --reference-doc
parameter
Open up Word and create a new custom table style in the template doc. In our example that custom table style is called 'MyCustomTable'
Generate your word doc using the --reference-doc
parameter - the custom table style will be included in the doc, you just have to insert its name in the right place. This bit of powershell will do that for you:
$outFile = "C:\Path\To\Your\Doc.docx"
$workFolder = "C:\Some\Temp\Folder\Somewhere\"
# then this replaces table style in $outFile:
$zipFile = $outFile.Replace(".docx",".zip")
Rename-Item $outFile $zipFile
Expand-Archive $zipFile -DestinationPath $workFolder -Force
$wordXml = Get-Content "${workFolder}Word\Document.xml"
$updatedXml = $wordXml.Replace('<w:tblStyle w:val="Table" />','<w:tblStyle w:val="MyCustomTable" />')
Set-Content -Path "${workFolder}Word\Document.xml" -Value $updatedXml
Compress-Archive -Path "${workFolder}*" -DestinationPath $zipFile -Force
Rename-Item $zipFile $outFile
... where $outFile is the docx, and $workFolder is a temp folder somewhere.
In some earlier versions of PanDoc, instead of seaching for <w:tblStyle w:val="Table" />
you'll need to search for <w:tblStyle w:val="TableNormal" />
Upvotes: 2
Reputation: 51
Just add a table style what every you want called "Table" in the reference-doc file。And update pandoc to latest.
Upvotes: 2
Reputation: 52689
edi9999 has the best answer but here's what I do:
When creating the docx, use a reference docx to get styles. That reference will contain a heap of other styles that just aren't used by Pandoc to create, but they are still in there. Typically you'll get the default sets, but you can add a new table style too.
Then, you only need to update the word\document.xml file to reference the new table style, and you can do that programmatically (by unzipping, running sed, and updating the docx archive), eg:
7z.exe x mydoc.docx word\document.xml
sed "s/<w:tblStyle w:val=\"TableNormal\"/<w:tblStyle w:val=\"NewTableStyle\"/g" word\document.xml > word\document2.xml
copy word\document2.xml word\document.xml /y
7z.exe u mydoc.docx word\document.xml
Upvotes: 8
Reputation: 2289
Same suggestion as edi9999: hack the xml content of converted docx. And the following is my R code for doing that.
The tblPr
variable contains the definition of style to be added to the tables in docx. You could modify the string to satisfy your own need.
require(XML)
docx.file <- "report.docx"
tblPr <- '<w:tblPr xmlns:w="http://schemas.openxmlformats.org/wordprocessingml/2006/main"><w:tblStyle w:val="a8"/><w:tblW w:w="0" w:type="auto"/><w:tblBorders><w:top w:val="single" w:sz="4" w:space="0" w:color="000000" w:themeColor="text1"/><w:left w:val="single" w:sz="4" w:space="0" w:color="000000" w:themeColor="text1"/><w:bottom w:val="single" w:sz="4" w:space="0" w:color="000000" w:themeColor="text1"/><w:right w:val="single" w:sz="4" w:space="0" w:color="000000" w:themeColor="text1"/><w:insideH w:val="single" w:sz="4" w:space="0" w:color="000000" w:themeColor="text1"/><w:insideV w:val="single" w:sz="4" w:space="0" w:color="000000" w:themeColor="text1"/></w:tblBorders><w:jc w:val="center"/></w:tblPr>'
## unzip the docx converted by Pandoc
system(paste("unzip", docx.file, "-d temp_dir"))
document.xml <- "temp_dir/word/document.xml"
doc <- xmlParse(document.xml)
tbl <- getNodeSet(xmlRoot(doc), "//w:tbl")
tblPr.node <- lapply(1:length(tbl), function (i)
xmlRoot(xmlParse(tblPr)))
added.Pr <- names(xmlChildren(tblPr.node[[1]]))
for (i in 1:length(tbl)) {
tbl.node <- tbl[[i]]
if ('tblPr' %in% names(xmlChildren(tbl.node))) {
children.Pr <- xmlChildren(xmlChildren(tbl.node)$tblPr)
for (j in length(added.Pr):1) {
if (added.Pr[j] %in% names(children.Pr)) {
replaceNodes(children.Pr[[added.Pr[j]]],
xmlChildren(tblPr.node[[i]])[[added.Pr[j]]])
} else {
## first.child <- children.Pr[[1]]
addSibling(children.Pr[['tblStyle']],
xmlChildren(tblPr.node[[i]])[[added.Pr[j]]],
after=TRUE)
}
}
} else {
addSibling(xmlChildren(tbl.node)[[1]], tblPr.node[[i]], after=FALSE)
}
}
## save hacked xml back to docx
saveXML(doc, document.xml, indent = F)
setwd("temp_dir")
system(paste("zip -r ../", docx.file, " *", sep=""))
setwd("..")
system("rm -fr temp_dir")
Upvotes: 11
Reputation: 20574
Here's how I searched how to do this:
The way to add a table in Docx is to use the <w:tbl>
tag. So I searched for this in the github repository, and found it in this file (called Writers/Docx.hs, so it's not a big surprise)
blockToOpenXML opts (Table caption aligns widths headers rows) = do
let captionStr = stringify caption
caption' <- if null caption
then return []
else withParaProp (pStyle "TableCaption")
$ blockToOpenXML opts (Para caption)
let alignmentFor al = mknode "w:jc" [("w:val",alignmentToString al)] ()
let cellToOpenXML (al, cell) = withParaProp (alignmentFor al)
$ blocksToOpenXML opts cell
headers' <- mapM cellToOpenXML $ zip aligns headers
rows' <- mapM (\cells -> mapM cellToOpenXML $ zip aligns cells)
$ rows
let borderProps = mknode "w:tcPr" []
[ mknode "w:tcBorders" []
$ mknode "w:bottom" [("w:val","single")] ()
, mknode "w:vAlign" [("w:val","bottom")] () ]
let mkcell border contents = mknode "w:tc" []
$ [ borderProps | border ] ++
if null contents
then [mknode "w:p" [] ()]
else contents
let mkrow border cells = mknode "w:tr" [] $ map (mkcell border) cells
let textwidth = 7920 -- 5.5 in in twips, 1/20 pt
let mkgridcol w = mknode "w:gridCol"
[("w:w", show $ (floor (textwidth * w) :: Integer))] ()
return $
[ mknode "w:tbl" []
( mknode "w:tblPr" []
( [ mknode "w:tblStyle" [("w:val","TableNormal")] () ] ++
[ mknode "w:tblCaption" [("w:val", captionStr)] ()
| not (null caption) ] )
: mknode "w:tblGrid" []
(if all (==0) widths
then []
else map mkgridcol widths)
: [ mkrow True headers' | not (all null headers) ] ++
map (mkrow False) rows'
)
] ++ caption'
I'm not familiar at all with Haskell, but I can see that the border-style is hardcoded, since there is no variable in it:
let borderProps = mknode "w:tcPr" []
[ mknode "w:tcBorders" []
$ mknode "w:bottom" [("w:val","single")] ()
, mknode "w:vAlign" [("w:val","bottom")] () ]
That means that you can't change the style of the docx tables with the current version of PanDoc. Howewer, there's a way to get your own style.
word/document.xml
and search for the <w:tbl>
Here's a test with a border-style I created:
And here is the corresponding XML:
<w:tblBorders>
<w:top w:val="dotted" w:sz="18" w:space="0" w:color="C0504D" w:themeColor="accent2"/>
<w:left w:val="dotted" w:sz="18" w:space="0" w:color="C0504D" w:themeColor="accent2"/>
<w:bottom w:val="dotted" w:sz="18" w:space="0" w:color="C0504D" w:themeColor="accent2"/>
<w:right w:val="dotted" w:sz="18" w:space="0" w:color="C0504D" w:themeColor="accent2"/>
<w:insideH w:val="dotted" w:sz="18" w:space="0" w:color="C0504D" w:themeColor="accent2"/>
<w:insideV w:val="dotted" w:sz="18" w:space="0" w:color="C0504D" w:themeColor="accent2"/>
</w:tblBorders>
I didn't have a look at it yet, ask if you don't find by yourself using a similar method.
Hope this helps and don't hesitate to ask something more
Upvotes: 25