Agostino
Agostino

Reputation: 2811

Merge .txt files in one .doc, adding file names and page breaks

I have a bunch of .txt files with various names in a folder, and I need to merge them into a single file that can be read in Office Word or LibreOffice Writer.

The tricky part is, the pasted files should be organized by creation date, have a title put before the content and a page break at the end, like this

Title of older file
File content
Page break

Title of newer file
File content
Page break

I could do this with Java, but it seems a little overkill. It would be nice if this could be done using Windows Powershell, or Unix bash. Added newlines should be Window style, though.

Full disclaimer: I know something about Bash, little about the Powershell and almost nothing about .doc/.odf formats.

Upvotes: 1

Views: 924

Answers (1)

Alexander Obersht
Alexander Obersht

Reputation: 3275

Merging TXTs into one DOCX and adding page breaks (PowerShell, requires MS Word):

[Ref]$rSaveFormat = "Microsoft.Office.Interop.Word.WdSaveFormat" -as [Type]
$oWord = New-Object -ComObject Word.Application
$oWord.Visible = $false
$sPath = <path to dir with txt files>
$cInFiles = Get-ChildItem $sPath
$sOutFile = $sPath + "\outfile.docx"
$iWordPageBreak = 7
$iNewLineChar = 11
$oDoc = $oWord.Documents.Add()
$oWordSel = $oWord.Selection

foreach ($sInFile in $cInFiles) {
    $sInFileTxt = Get-Content $sInFile

    $oWordSel.TypeText($sInFile)
    $oWordSel.TypeText([Char]$iNewLineChar)
    $oWordSel.TypeText($sInFileTxt)
    $oWordSel.InsertBreak($iWordPageBreak)
}

$oDoc.SaveAs($sOutFile, $rSaveFormat::wdFormatDocumentDefault)
$oDoc.Close()
$oWord.Quit()
$oWord = $null

For explanations see this blog post on TechNet.

Edit: without Word you probably should use ODT format and directly edit content.xml. Example in Python. Though personally I would simply concatenate the TXT files. Unless you have a million of them it's faster and easier to add page breaks manually than actually edit XML.

Upvotes: 1

Related Questions