Stella
Stella

Reputation: 1868

Apache Beam - Reading JSON and Stream

I am writing Apache beam code, where I have to read a JSON file which has placed in the project folder, and read the data and Stream it.

This is the sample code to read JSON. Is this correct way of doing it?

PipelineOptions options = PipelineOptionsFactory.create();
options.setRunner(SparkRunner.class);

Pipeline p = Pipeline.create(options);

PCollection<String> lines = p.apply("ReadMyFile", TextIO.read().from("/Users/xyz/eclipse-workspace/beam-prototype/test.json"));
System.out.println("lines: " + lines);

or I should use,

p.apply(FileIO.match().filepattern("/Users/xyz/eclipse-workspace/beam-prototype/test.json"))

I just need to read the below json file. Read the complete testdata from this file and then Stream it.

{
“testdata":{
“siteOwner”:”xxx”,
“siteInfo”:{
“siteID”:”id_member",
"siteplatform”:”web”,
"siteType”:”soap”,
"siteURL”:”www”,
}
}
}

The above code is not reading the json file, it is printing like

lines: ReadMyFile/Read.out [PCollection]

, could you please guide me with sample reference?

Upvotes: 1

Views: 2024

Answers (1)

Andrew Nguonly
Andrew Nguonly

Reputation: 2621

This is the sample code to read JSON. Is this correct way of doing it?

To quickly answer your question, yes. Your sample code is the correct way to read a file containing JSON, where each line of the file contains a single JSON element. The TextIO input transform reads a file line by line, so if a single JSON element spans multiple lines, then it will not be parseable.

The second code sample has the same effect.

The above code is not reading the json file, it is printing like

The printed result is expected. The variable lines does not actually contain the JSON strings in the file. lines is a PCollection of Strings; it simply represents the state of the pipeline after a transform is applied. Accessing elements in the pipeline can be done by applying subsequent transforms. The actual JSON string can be access in the implementation of a transform.

Upvotes: 1

Related Questions