Can Apache Drill query a list of files with updated data?

Question

I have a large (more than 8.5GB) CSV file that is updated on the first day of each month. But from the 2nd to the last day of each month, it can have new updated data in the JSON format.

I convert the CSV to panquet and do the query in Apache Drill, it works fine. But how can I query the big file with the updated file?

e.g. In the Apr 1st CSV file, it has

ID          Name           Value    LastUpdatedTime
100         John           98       2024-01-05

In the Apr 15 JSON file, it has

ID          Name           Value    LastUpdatedTime
100         John           100      2024-04-15

When it query all these files for ID = 100, it should give Value=100 as it has newer LastUpdatedTime.

I find this post saying people use Drill on data that is no longer changing.

Is that true?

Can Apache Drill query a list of files with updated data?

Answers (1)

Related Questions