Reputation: 624
I'm using kudu and spark streaming for a realtime dashboard, my problem is that when I'm joining the batch from spark streaming with kudu table it doesn't make a predicate pushdown on it and takes 2-3 seconds to fetch the entire table in spark and after that filter it. It's any way to avoid this?
Thanks,
Alexandru
Upvotes: 0
Views: 477
Reputation: 17
1.Kudu is a Columnar storage engine,so you can select what column you need.It can decrease the data pulled from kudu.
2.kudu predicate pushdown support >,<,>=,<=,=,BETWEEN, or IN maybe you can cache the data,after you filtering data from kudu.And predicate pushdown may triggered.
Upvotes: -1