Reputation: 3047
I have a lot of data files that will be eventually be pushed and stored on the Azure Storage/Data Lake at a regular interval of time. I want to provide an ability to do Analytic on this data but then I see that on Azure there are two approach:
can some one suggest me when to use which of this approach? it looks to me that both can do the similar job.
Upvotes: 11
Views: 4622
Reputation: 3958
You can think of U-SQL as Microsoft's version of Spark SQL, where you can write SQL Server styled SQL and extend with User-Defined Functions in C#. While with Spark you write in a Semi MySQL styled SQL and extend it with either Scala or Python.
If you are familiar with Scala or Python then choosing HDInsight might be the best choice. Spark comes with GraphX and MLLib which at the moment have no analogues in Data Lake Analytics. Also if you need something that works outside of Azure then SparkSQL is your only option.
Another important dimension to think about is the pricing. Data Lake Analytics only costs money while your query is executing, but HDInsight costs money for as long as the cluster is running. Depending on the size of the data and the complexity of your queries Data Lake Analytics can be cheaper because you aren't charged while it's provisioning.
Upvotes: 17