user3022917
user3022917

Reputation: 617

How to read data from a CSV file where one column contains JSON data?

Is there a way to read the CSV column data as JSON?

I'm using PSDuckDB...and currently I'm using pure PowerShell to select this 'AuditData' column with JSON data and write it to a new JSON file before I can start with DuckDB queries.

Import-Csv -Path 'C:\M365-UnifiedAuditLog.csv' | Select-Object -ExpandProperty AuditData | Out-File 'C:\AuditData.json' -Encoding UTF8NoBOM

I'm new to DuckDB, but the speed is very promising!

Upvotes: -1

Views: 81

Answers (1)

peak
peak

Reputation: 116957

If the CSV is valid, and if there is a properly encoded JSON column (i.e., encoded according to the CSV encoding rules), then reading the JSON column can be done trivially.

Example: CSV with header, with column j as encoded JSON

id,j
1,"[1,2]"
2,"{""a"":3}"
$ duckdb
D from read_csv('json-in-csv.csv', types={'j': 'JSON'} );
┌───────┬─────────┐
│  id   │    j    │
│ int64 │  json   │
├───────┼─────────┤
│     1 │ [1,2]   │
│     2 │ {"a":3} │
└───────┴─────────┘
D 

One can also use columns or auto_type_candidates, e.g. as follow:

from read_csv('json-in-csv.csv', columns={'id': int64, 'j':JSON});
from read_csv('json-in-csv.csv', auto_type_candidates=['json', 'BIGINT']);

Things would of course be much simpler if you were dealing with a TSV file, since raw tabs cannot appear in valid JSON.

Upvotes: 0

Related Questions