Reputation: 2811
I have a bunch of .txt files with various names in a folder, and I need to merge them into a single file that can be read in Office Word or LibreOffice Writer.
The tricky part is, the pasted files should be organized by creation date, have a title put before the content and a page break at the end, like this
Title of older file
File content
Page break
Title of newer file
File content
Page break
I could do this with Java, but it seems a little overkill. It would be nice if this could be done using Windows Powershell, or Unix bash. Added newlines should be Window style, though.
Full disclaimer: I know something about Bash, little about the Powershell and almost nothing about .doc/.odf formats.
Upvotes: 1
Views: 924
Reputation: 3275
Merging TXTs into one DOCX and adding page breaks (PowerShell, requires MS Word):
[Ref]$rSaveFormat = "Microsoft.Office.Interop.Word.WdSaveFormat" -as [Type]
$oWord = New-Object -ComObject Word.Application
$oWord.Visible = $false
$sPath = <path to dir with txt files>
$cInFiles = Get-ChildItem $sPath
$sOutFile = $sPath + "\outfile.docx"
$iWordPageBreak = 7
$iNewLineChar = 11
$oDoc = $oWord.Documents.Add()
$oWordSel = $oWord.Selection
foreach ($sInFile in $cInFiles) {
$sInFileTxt = Get-Content $sInFile
$oWordSel.TypeText($sInFile)
$oWordSel.TypeText([Char]$iNewLineChar)
$oWordSel.TypeText($sInFileTxt)
$oWordSel.InsertBreak($iWordPageBreak)
}
$oDoc.SaveAs($sOutFile, $rSaveFormat::wdFormatDocumentDefault)
$oDoc.Close()
$oWord.Quit()
$oWord = $null
For explanations see this blog post on TechNet.
Edit: without Word you probably should use ODT format and directly edit content.xml. Example in Python. Though personally I would simply concatenate the TXT files. Unless you have a million of them it's faster and easier to add page breaks manually than actually edit XML.
Upvotes: 1