Vers_us
Vers_us

Reputation: 333

Import JSON into ClickHouse

I create table with this statement:

CREATE TABLE event(
    date Date,
    src UInt8,
    channel UInt8,
    deviceTypeId UInt8,
    projectId UInt64,
    shows UInt32,
    clicks UInt32,
    spent Float64
) ENGINE = MergeTree(date, (date, src, channel, projectId), 8192);

Raw data looks like:

{ "date":"2016-03-07T10:00:00+0300","src":2,"channel":18,"deviceTypeId ":101, "projectId":2363610,"shows":1232,"clicks":7,"spent":34.72,"location":"Unknown", ...}
...

Files with data loaded with the following command:

cat *.data|sed 's/T[0-9][0-9]:[0-9][0-9]:[0-9][0-9]+0300//'| clickhouse-client --query="INSERT INTO event FORMAT JSONEachRow"

clickhouse-client throw exception:

Code: 117. DB::Exception: Unknown field found while parsing JSONEachRow format: location: (at row 1)

Is it possible to skip fields from JSON object that not presented in table description?

Upvotes: 9

Views: 14030

Answers (2)

uYSIZfoz
uYSIZfoz

Reputation: 911

Currently, it is not possible to skip unknown fields.

You may create temporary table with additional field, INSERT data into it, and then do INSERT SELECT into final table. Temporary table may have Log engine and INSERT into that "staging" table will work faster than into final MergeTree table.

It is relatively easy to add possibility to skip unknown fields into code (something like setting 'format_skip_unknown_fields').

Upvotes: 1

Vitaliy L.
Vitaliy L.

Reputation: 401

The latest ClickHouse release (v1.1.54023) supports input_format_skip_unknown_fields user option which eneables skipping of unknown fields for JSONEachRow and TSKV formats.

Try

clickhouse-client -n --query="SET input_format_skip_unknown_fields=1; INSERT INTO event FORMAT JSONEachRow;"

See more details in documentation.

Upvotes: 20

Related Questions