abhishek jha
abhishek jha

Reputation: 1095

Is skipping leading rows when reading files in google dataflow possible

I want to skip leading rows when reading files while using google dataflow. Is that feature available in the lastest version? The files are kept in google storage. I will be writing these files to big query.

bq load command has option --skip_leading_rows . This option skips the leading rows when reading from the files.

I want a similar feature to this in google dataflow. My input is in following format.

I want google dataflow to ignore the first line and write only the rest of the lines to big Query

enter image description here

Upvotes: 1

Views: 1006

Answers (1)

Graham Polley
Graham Polley

Reputation: 14791

This feature is not supported directly in Dataflow/ParDo's.

You need to use a Filter.byPredicate() to achieve this.

e.g.

PCollection<X> rows = ...;
PCollection<X> nonHeaders =
   rows.apply(Filter.by(new MatchIfNonHeader()));

Upvotes: 2

Related Questions