Reputation: 1138
I'm trying to use U-SQL and R to forecast, so i need to pass from U-SQL to R a list of values, and return forecast from R to U-SQL
All examples i found uses a reducer, so will process 1 row only.
https://learn.microsoft.com/en-us/azure/data-lake-analytics/data-lake-analytics-u-sql-r-extensions
Is it possible to instead of send to R a list of columns, send a list of rows to process?
Thanks!
Upvotes: 3
Views: 331
Reputation: 440
There is another IMPORTANT detail that may be the cause of the problem you mentioned - Partitioning. By using REDUCE expression, we can separate our analysis workload by partitions. Each partition can be executed independently in parallel, and all results are collected by REDUCE operation at the end. When using R to make data prediction we need all rows at once to run the algorithms, so we can't do any partitions. If we don't require partitioning we can use the REDUCE ALL. Another way is to specify a pseudo partition (one same partition for all rows).
Check an example here: https://github.com/Azure/ADLAwithR-GettingStarted/tree/master/Tutorial/Exercise5
Upvotes: 0
Reputation: 440
By definition the User-defined reducers take n rows and produce one or more rows, use it to produce new column data but also new rows. The R extensions for U-SQL include a built-in reducer (Extension.R.Reducer) that runs R code on each vertex assigned to the reducer. You can get the input rowset with the special R parameter of "inputFromUSQL" and work on it with R.
Like you referenced this should work on all rows at once:
DECLARE @myRScript = @"
inputFromUSQL$mydata = as.factor(inputFromUSQL$mydata)
<..>
";
@myData = <my u-sql query>
@RScriptOutput = REDUCE @myData <..>
USING new Extension.R.Reducer(command:@myRScript, rReturnType:"dataframe")
Upvotes: 1