Kevin E
Kevin E

Reputation: 3486

Using LibreOffice's 'scalc' to convert tab-delimited text to Excel from the command line

I have a LF-terminated, tab-delimited input file that I wish to convert to Excel format (OOXML .xlsx, for example) using scalc --headless.

I am aware of other options like pyuno, and even some prewritten scripts that are out there like this one, but I see no reason to mess with Python in order to accomplish something that I feel certain scalc can do with built-in command-line options.

Here's what I've tried that didn't work:

scalc --headless --convert-to xlsx input.tsv
scalc --headless --infilter=tsv --convert-to xlsx input.tsv
scalc --headless --infilter=tsv:Text --convert-to xlsx input.tsv
scalc --headless --infilter=Text:tsv --convert-to xlsx input.tsv  # --> Error: no export filter

The last two were complete guesses, because, well, there is no real documentation for the --infilter option, other than the two examples in the output of scalc --help.

The last one produced an error, but in all other cases, the tab character is not recognized as field delimiter, the resulting Excel spreadsheet just has all the data concatenated together in the first column.

References

Upvotes: 0

Views: 2432

Answers (1)

Kevin E
Kevin E

Reputation: 3486

Although disappointingly undocumented, I was able to cobble together this working solution from several different sources:

scalc --headless --infilter="Text - txt - csv (StarCalc):9" \
      --convert-to xlsx input.tsv

The "9" is the decimal ASCII control code for horizontal tab. I used python -c 'print(ord("\t"))' to find this, although man ascii would've worked, too.

As detailed here, there are a number of comma-separated options that can be passed to this filter:

╔══════════════╤══════════╤═══════════╤═════════════╤══════════════════╤═════════════════════╗
║ option (pos) │ sep. (1) │ quote (2) │ charset (3) │ 1st line (4)     │ cell format (5)     ║
╠══════════════╪══════════╪═══════════╪═════════════╪══════════════════╪═════════════════════╣
║ defaults (?) │ 44 (,)   │ 34 (")    │ 0 (system)  │ 1                │ null ("standard")   ║
╟──────────────┼──────────┼───────────┼─────────────┼──────────────────┼─────────────────────╢
║ example      │ 9 (tab)  │ 39 (')    │ 76 (UTF-8)  │ 2 (skip 1st row) │ 1/9 (skip 1st col.) ║
╚══════════════╧══════════╧═══════════╧═════════════╧══════════════════╧═════════════════════╝

So, for example:

scalc --headless --infilter="Text - txt - csv (StarCalc):9,39,76,2,1/9" \
      --convert-to xlsx input.tsv

Would convert a tab-delimited input file with single quotes as text delimiters and UTF-8 encoding to an Excel .xlsx file, skipping the first row and column.

As mentioned above, the input record separator and quote character are the ASCII ordinals. The codes for the character set may be looked up here, but 0 (system default), 1 (Windows-1252), 12 (ISO-8859-1), and 76 (UTF-8) are the interesting ones for English-speakers.

The 5th place in the comma-separated option string, the "cell format," specifies the format for specific columns, and is delimited by forward slashes (col/fmt/col/fmt). This is explained in detail here, but briefly:

1   Standard
2   Text
3   MM/DD/YY
4   DD/MM/YY
5   YY/MM/DD
6   - (unused?)
7   - ("     ")
8   - ("     ")
9   ignore field (do not import)
10  US-English
    (',' as 1000s sep, '.' as decimal, regardless of locale)

There are even more values in the filter options string after that, which pertain to various options in the import/export dialog box such as "Quoted field as text." Refer again to the same OpenOffice wiki article linked above for an explanation of those, because I'm running out of steam here.

Sources

  • this Linux Journal article using pyuno, which clued me into the use of Text - txt - csv (StarCalc) as the filter name
    • which I corroborated by basically blundering into this directory in the LibreOffice source tree, where all the filters are defined
  • https://wiki.openoffice.org/wiki/Documentation/DevGuide/Spreadsheets/Filter_Options
    • specifically, this section for the CSV filter
    • not-so-helpfully marked as "no longer current as of OpenOffice 4.1.x," but current enough to still solve my problem
    • this article is what gave me the idea to try the ASCII ordinal for tab as the first character of the "filter options" part of the --infilter string
    • that I should use the colon as a delimiter is something I inferred from this commit in 2014, which added the two --infilter examples to the output of soffice --help

Upvotes: 2

Related Questions