Reputation: 1095
I want to skip leading rows when reading files while using google dataflow. Is that feature available in the lastest version? The files are kept in google storage. I will be writing these files to big query.
bq load command has option --skip_leading_rows . This option skips the leading rows when reading from the files.
I want a similar feature to this in google dataflow. My input is in following format.
I want google dataflow to ignore the first line and write only the rest of the lines to big Query
Upvotes: 1
Views: 1006
Reputation: 14791
This feature is not supported directly in Dataflow/ParDo's.
You need to use a Filter.byPredicate()
to achieve this.
e.g.
PCollection<X> rows = ...;
PCollection<X> nonHeaders =
rows.apply(Filter.by(new MatchIfNonHeader()));
Upvotes: 2