Reputation: 1
I'm trying to convert a string column filled with null values and a few numbers stored as strings, to an integer column in Google's DataFlow. Could anyone help me out with a python code to do that?
Upvotes: 0
Views: 1428
Reputation: 73
Looks like this has been sitting out here for awhile. It would be helpful if you could post some example text/code of what you have tried so far or what the data looks like. Here is the best I can do with limited information:
with beam.Pipeline(options=PipelineOptions()) as p:
#this reads in the data
your_data = p | 'Your_Data' >> beam.io.ReadFromText('/path/to/data.csv')
#each line is read in as a String '11139422, null, null, 60.75'
#so we split each row of the PCollection into it's own String of values
# '11139422', '', '', '60.75'
split_your_data = your_data | 'split' >> beam.FlatMap(lambda x: x.split(","))
#We then have to convert everything to int values
your_data_to_int = split_your_data | 'String_to_Int' >> beam.Map(lambda w: int(w))
Upvotes: 1