Ashvin Suresh
Ashvin Suresh

Reputation: 1

Convert a String Column to an integer column in apache beam?

I'm trying to convert a string column filled with null values and a few numbers stored as strings, to an integer column in Google's DataFlow. Could anyone help me out with a python code to do that?

Upvotes: 0

Views: 1428

Answers (1)

DMan
DMan

Reputation: 73

Looks like this has been sitting out here for awhile. It would be helpful if you could post some example text/code of what you have tried so far or what the data looks like. Here is the best I can do with limited information:

    with beam.Pipeline(options=PipelineOptions()) as p:
        #this reads in the data
        your_data = p | 'Your_Data' >> beam.io.ReadFromText('/path/to/data.csv')
        #each line is read in as a String '11139422, null, null, 60.75'
        #so we split each row of the PCollection into it's own String of values
        # '11139422', '', '', '60.75'
        split_your_data = your_data | 'split' >> beam.FlatMap(lambda x: x.split(","))
        #We then have to convert everything to int values
        your_data_to_int = split_your_data | 'String_to_Int' >> beam.Map(lambda w: int(w))

Upvotes: 1

Related Questions