Reputation: 327
My input is a list of json and I want to have a multiple elements PCollection. This is my code:
def parse_json(data):
import json
for i in json.loads(data):
return i
data = (p
| "Read text" >> beam.io.textio.ReadFromText(f'gs://{bucket_name}/not_processed/2020-06-08T23:59:59.999Z__rms004_m1__not_sent_msg.txt')
| "Parse json" >> beam.Map(parse_json))
The thing is I only get the first element of the list when the list is composed of 2 elements.
How do I achieve this?
Upvotes: 1
Views: 2367
Reputation: 327
I found out.
There is a function called ParDo in Apache Beam just for this.
def parse_json(data):
import json
return json.loads(data)
data = (p
| "Read text" >> beam.io.textio.ReadFromText(f'gs://{bucket_name}/not_processed/2020-06-08T23:59:59.999Z__rms004_m1__not_sent_msg.txt')
| "Parse json" >> beam.ParDo(parse_json))
Upvotes: 2