Kaushik
Kaushik

Reputation: 1324

Google's BigQuery vs Azure data lake U-SQL

I am trying to understand the difference or The pros and cons between Google's Big query and Azure data Lake U-SQL. Which is better ? I have exhaustively searched what the big difference is but couldnt find it.

Upvotes: 3

Views: 4785

Answers (1)

Brij Raj Singh - MSFT
Brij Raj Singh - MSFT

Reputation: 5113

Ok here are some fundamental differences between both technologies.

Data Shape

  1. Google big query - they ask you to transform your data into certain shapes like json, csv or Avro.
  2. Data lake - they just ask you to dump whatever you have in the lake store, and you can run usql queries on top of it.

Data Size

Google big query has limits over file size - https://cloud.google.com/bigquery/loading-data-into-bigquery#quota though they are pretty big limits

Data Lake - has officially no limits over file size, you can practically start with a Petabyte file.

The biggest difference is between the query model, but before that one must know that you can also run HBase workloads on top of Azure data lake store, and HBase is actually an open source implementation of google big table, many other subtle differences you can see here http://www.larsgeorge.com/2009/11/hbase-vs-bigtable-comparison.html.

The google big query is well not a compiled query per se, while USQL is a combination of SQL like syntax with CLR capabilities, the USQL queries are first compiled and then ran over the data store, which allows one to write custom functions to use with their queries to parse or work with diff forms of data. One can even visualize the execution plan of a USQL query using Azure data lake tools. Both Big query and USQL are pretty easy to understand and work with.

Authentication

  1. Google big query has standard API authentication https://cloud.google.com/bigquery/authentication
  2. ADL - Authentication of Application and users is controlled by Azure AD.

As a big data platform both demand respect, but I personally find Azure Data lake as a much better implementation since it allows flexibility to work with open source projects like spark, storm, hive, pig etc., while big table limits your capabilities to just google ecosystem.

Connect with me at my twitter handle @brijrajsingh and if you can make it do drop by at GIDS Bangalore, I am delivering a session about data lakes on 29th April, 2016

Upvotes: 9

Related Questions