user3331966
user3331966

Reputation: 152

Splitting <dbl [2]> result of Sparklyr as a spark object

I have a problem with splitting the outcome of my random forest generated by Sparklyr.

I'm using the following code to generate a model, which predict a {0 | 1} value and predict the outcome for a specified validation set.

model <- ml_random_forest( tbl(sc,"train_set") , formulea)

prediction <- sdf_predict( model, tbl(sc,"validation_set") ) %>% select(account_no, probability , prediction)

This generated prediction object looks like:

Source:   query [3.744e+06 x 3]
Database: spark connection master=yarn-client app=Dev - model v.11 local=FALSE

   account_no probability prediction
        <dbl>      <list>      <dbl>
1     5053177   <dbl [2]>          1
2     6508441   <dbl [2]>          1
3     7805527   <dbl [2]>          1
4    10001696   <dbl [2]>          1
5    10004230   <dbl [2]>          1
6    10005647   <dbl [2]>          1
7    10006029   <dbl [2]>          1
8    10018558   <dbl [2]>          0
9    10019161   <dbl [2]>          1
10   10031652   <dbl [2]>          1
# ... with 3.744e+06 more rows

How can i split the list in Spark, to get only the first number of the list. Something like this ...

   account_no probability 
        <dbl>      <dbl>
1     5053177   <0.9726>          
2     6508441   <0.1234>          

Hope someone can help to solve this issue.

Greetings, Jitske

Upvotes: 2

Views: 196

Answers (1)

kevinykuo
kevinykuo

Reputation: 4772

Install the latest devel version off GitHub and look up ?sdf_separate_column:

prediction %>%  
  sdf_separate_column("probability", c("p0", "p1"))

Upvotes: 3

Related Questions