user2107356
user2107356

Reputation: 115

Reading JSON file with BigQuery to make table

I'm new to Google Dataflow, and can't get this thing to work with JSON. I've been reading throughout the documentation, but can't solve my problem.

So, following the WordCount example i figured how data is loaded from .csv file with next line

PCollection<String> input = p.apply(TextIO.Read.from(options.getInputFile()));

where inputFile in .csv file from my gcloud bucket. I can transform read lines from .csv with:

PCollection<TableRow> table = input.apply(ParDo.of(new ExtractParametersFn()));

(Extract ParametersFn defined by me). So far so good!


But then I realize my .csv file is too big and had to convert it to JSON (https://cloud.google.com/bigquery/preparing-data-for-bigquery). Since BigQueryIO is supposedly better for reading JSON, I tried with the following code:

 PCollection<TableRow> table = p.apply(BigQueryIO.Read.from(options.getInputFile()));

(inputFile is then JSON file and the output when reading with BigQuery is PCollection with TableRows) I tried with TextIO too (which returns PCollection with Strings) and neither of the two IO options work.

What am I missing? The documentation is really not that detailed to find an answer there, but perhaps some of you guys already dealt with this problem before?

Any suggestions would be very appreciated. :)

Upvotes: 3

Views: 1980

Answers (1)

Tudor Marian
Tudor Marian

Reputation: 239

I believe there are two options to consider:

  1. Use TextIO with TableRowJsonCoder to ingest the JSON files (e.g., like it is done in the TopWikipediaSessions example);
  2. Import the JSON files into a bigquery table (https://cloud.google.com/bigquery/loading-data-into-bigquery), and then use BigQueryIO.Read to read from the table.

Upvotes: 3

Related Questions