Apache Drill Query data rertival is not constant on HDFS system

Question

I am working on Apache Drill and HDFS in my project.

I am dealing with v.big file (e.g 150GB) and that file is stored in HDFS system. I am writing my Drill query such a way that i will get some amount of data and i will process that (e.g 100 rows) and then again fire a query on that file, so my performance will increase. (e.g SELECT * FROM dfs.file path LIMIT 100 )

But every time when i perform a query on that File which is in HDFS system, i am not getting consistent data. It changes every time as Hadoop will fetch that data from any cluster.

Because of that, it may be the case that during the entire process of getting all the record, i may get the same records which i have already.

Apache Drill Query data rertival is not constant on HDFS system

Answers (1)

Related Questions