erich
erich

Reputation: 71

Is it better to use Row or GenericRowData with DataStream API?

I Am working with flink 1.15.2, should i use Row or GenericRowData that inherit RowData for my own data type?, i mostly use streaming api. Thanks. Sig.

Upvotes: 1

Views: 2244

Answers (2)

Milimetric
Milimetric

Reputation: 13549

The docs are actually helpful here, I was confused too.

The section at the top titled "All Known Implementing Classes" lists all the implementations. RowData and GenericRowData are described as internal data structures. If you can use a POJO, then great. But if you need something that implements RowData, take a look at BinaryRowData, BoxedWrapperRowData, ColumnarRowData, NestedRowData, or any of the implementations there that aren't listed as internal.

I'm personally using NestedRowData to map a DataStream[Row] into a DataStream[RowData] and I'm not at all sure that's a good idea :) Especially since I can't seem to add a string attribute

Upvotes: 1

twalthr
twalthr

Reputation: 2654

In general the DataStream API is very flexible when it comes to record types. POJO types might be the most convenient ones. Basically any Java class can be used but you need to check which TypeInformation is extracted via reflection. Sometimes it is necessary to manually overwrite it.

For Row you will always have to provide the types manually as reflection cannot do much based on class signatures.

GenericRowData should be avoided, it is rather an internal class with many caveats (strings must be StringData and array handling is not straightforward). Also GenericRowData becomes BinaryRowData after deserialization. TLDR This type is meant for the SQL engine.

Upvotes: 3

Related Questions