Zach Moshe
Zach Moshe

Reputation: 2980

Is it possible to read a text file sequentially?

I'm using beam.io.ReadFromText to process data from textual files.

Parsing the files is more complex than reading by lines (there is some state that needs to be carried and changed from line to line).

Can I make Beam read my file with only one processor? (not parallelized) Any other best practice for these cases?

Upvotes: 2

Views: 877

Answers (1)

jkff
jkff

Reputation: 17913

Yes, you are free to do arbitrary processing of files yourself, using the FileSystems API. This is what ReadFromText and all other file-based built-in transforms do under the hood.

def ParseFile(name):
  with FileSystems.open(name) as f:
    ... Parse the file and yield elements ...

p | beam.Create(['/path/to/file'])
  | beam.FlatMapElements(ParseFile)

Upvotes: 4

Related Questions