Reputation: 3486
I have a LF-terminated, tab-delimited input file that I wish to convert to Excel format (OOXML .xlsx
, for example) using scalc --headless
.
I am aware of other options like pyuno, and even some prewritten scripts that are out there like this one, but I see no reason to mess with Python in order to accomplish something that I feel certain scalc
can do with built-in command-line options.
Here's what I've tried that didn't work:
scalc --headless --convert-to xlsx input.tsv
scalc --headless --infilter=tsv --convert-to xlsx input.tsv
scalc --headless --infilter=tsv:Text --convert-to xlsx input.tsv
scalc --headless --infilter=Text:tsv --convert-to xlsx input.tsv # --> Error: no export filter
The last two were complete guesses, because, well, there is no real documentation for the --infilter
option, other than the two examples in the output of scalc --help
.
The last one produced an error, but in all other cases, the tab character is not recognized as field delimiter, the resulting Excel spreadsheet just has all the data concatenated together in the first column.
--infilter="Calc Office Open XML"
--infilter="Text (encoded):UTF8,LF,Liberation Mono,en-US"
which I assume indicate conversion from .xlsx to Calc, and converting from plain text to Writer, respectivelyUpvotes: 0
Views: 2432
Reputation: 3486
Although disappointingly undocumented, I was able to cobble together this working solution from several different sources:
scalc --headless --infilter="Text - txt - csv (StarCalc):9" \
--convert-to xlsx input.tsv
The "9" is the decimal ASCII control code for horizontal tab. I used python -c 'print(ord("\t"))'
to find this, although man ascii
would've worked, too.
As detailed here, there are a number of comma-separated options that can be passed to this filter:
╔══════════════╤══════════╤═══════════╤═════════════╤══════════════════╤═════════════════════╗
║ option (pos) │ sep. (1) │ quote (2) │ charset (3) │ 1st line (4) │ cell format (5) ║
╠══════════════╪══════════╪═══════════╪═════════════╪══════════════════╪═════════════════════╣
║ defaults (?) │ 44 (,) │ 34 (") │ 0 (system) │ 1 │ null ("standard") ║
╟──────────────┼──────────┼───────────┼─────────────┼──────────────────┼─────────────────────╢
║ example │ 9 (tab) │ 39 (') │ 76 (UTF-8) │ 2 (skip 1st row) │ 1/9 (skip 1st col.) ║
╚══════════════╧══════════╧═══════════╧═════════════╧══════════════════╧═════════════════════╝
So, for example:
scalc --headless --infilter="Text - txt - csv (StarCalc):9,39,76,2,1/9" \
--convert-to xlsx input.tsv
Would convert a tab-delimited input file with single quotes as text delimiters and UTF-8 encoding to an Excel .xlsx file, skipping the first row and column.
As mentioned above, the input record separator and quote character are the ASCII ordinals. The codes for the character set may be looked up here, but 0
(system default), 1
(Windows-1252), 12
(ISO-8859-1), and 76
(UTF-8) are the interesting ones for English-speakers.
The 5th place in the comma-separated option string, the "cell format," specifies the format for specific columns, and is delimited by forward slashes (col/fmt/col/fmt
). This is explained in detail here, but briefly:
1 Standard
2 Text
3 MM/DD/YY
4 DD/MM/YY
5 YY/MM/DD
6 - (unused?)
7 - (" ")
8 - (" ")
9 ignore field (do not import)
10 US-English
(',' as 1000s sep, '.' as decimal, regardless of locale)
There are even more values in the filter options string after that, which pertain to various options in the import/export dialog box such as "Quoted field as text." Refer again to the same OpenOffice wiki article linked above for an explanation of those, because I'm running out of steam here.
Text - txt - csv (StarCalc)
as the filter name
--infilter
string--infilter
examples to the output of soffice --help
Upvotes: 2