R overflow
R overflow

Reputation: 1352

Translate Spark SQL function to "normal" R code

I am trying to follow an Vignette "How to make a Markov Chain" (http://datafeedtoolbox.com/attribution-theory-the-two-best-models-for-algorithmic-marketing-attribution-implemented-in-apache-spark-and-r/).

This tutorial is interesting, because it is using the same data source as I use. But, a part of the code is using "Spark SQL code" (what I got back from my previous question Concat_ws() function in Sparklyr is missing).

My question: I googled a lot and tried to solve this by myself. But I have no idea how, since I don't know exactly what the data should look like (the author didn't gave an example of his DF before and after the function).

How can I transform this piece of code into "normal" R code (without using Spark) (especially: the concat_ws & collect_list functions are causing trouble

He is using this line of code:

channel_stacks = data_feed_tbl %>%
 group_by(visitor_id, order_seq) %>%
 summarize(
   path = concat_ws(" > ", collect_list(mid_campaign)),
   conversion = sum(conversion)
 ) %>% ungroup() %>%
 group_by(path) %>%
 summarize(
   conversion = sum(conversion)
 ) %>%
 filter(path != "") %>%
 collect()

From my previous question, I know that we can replace a part of the code:

concat_ws() can be replaced the paste() function

But again, another part of code is jumping in:

collect_list()  # describtion: Aggregate function: returns a list of objects with duplicates.

I hope that I described this question as clear as possible.

Upvotes: 0

Views: 153

Answers (1)

zacdav
zacdav

Reputation: 4671

paste has the ability to collapse the string vector with a separator that is provided with the collapse parameter.

This can act as a drop in replacement for concat_ws(" > ", collect_list(mid_campaign))

channel_stacks = data_feed_tbl %>%
     group_by(visitor_id, order_seq) %>%
     summarize(
       path = paste(mid_campaign, collapse = " > "),
       conversion = sum(conversion)
     ) %>% ungroup() %>%
     group_by(path) %>%
     summarize(
       conversion = sum(conversion)
     ) %>%
     filter(path != "")

Upvotes: 1

Related Questions