Reputation: 355
I have a data frame as follows:
library(tidyverse)
m <- matrix(rep(1:28,each = 10), ncol = 28)
colnames(m) <- c("co1","col2", LETTERS)
df <- as_tibble(m)
df
newdf<-df %>%
unite("newcol", c(col1, col2), sep = " ", remove=F) %>%
gather("col4", "col5", A:Z)
I am trying to write the following and have it work in sparklyr:
dataframe %>%
unite(newcol, col1, col2, sep = " ", remove=F) %>%
gather("col4", "col5", A:Z)
Unite: Where newcol is the new column name that contains the united values of col1 and col2.
Gather: Where A:Z are the keys that are to be placed in the new col4. col5 is the new column that will contain the variable currently in columns A:Z
Does anyone have a user-defined function of a wrapper that will do the gather and unite function within the the spraklyr package perhaps using SQL?
Upvotes: 1
Views: 865
Reputation: 2469
You can do it with SQL and the DBI
library.
Assuming that you have your data.frame copied in Spark. (note that you have a typo in your colnames
function call, you have set the name of the first column to "co1", not "col1").
copy_to(sc, df)
Now, you can create a new Spark table with your desired columns with this:
DBI::dbSendQuery(sc, "CREATE TABLE new_df AS
SELECT CONCAT(t1.col1, t1.col2) AS newcol, t1.col1, t1.col2, t2.col4, t2.col5
FROM df t1
LATERAL VIEW EXPLODE (map('A', A, 'B', B, 'C', C, 'D', D, 'E', E, 'F', F, 'G', G,
'H', H, 'I', I, 'J', J, 'K', K, 'L', L, 'M', M, 'N', N, 'O', O, 'P', P, 'Q', Q,
'R', R, 'S', S, 'T', T, 'U', U, 'V', V, 'W', W, 'X', X, 'Y', Y, 'Z', Z)) t2 as col4, col5 ORDER BY col4 DESC")
Now, you can get the reference to this new Spark table and use it in Sparklyr:
ref_new_df <- tbl(sc, "new_df")
I hope this helps.
EDITED to show the result
> copy_to(sc, df)
> DBI::dbSendQuery(sc, "CREATE TABLE new_df AS
+ SELECT CONCAT(t1.col1, t1.col2) AS newcol, t1.col1, t1.col2, t2.col4, t2.col5
+ FROM df t1
+ LATERAL VIEW EXPLODE (map('A', A, 'B', B, 'C', C, 'D', D, 'E', E, 'F', F, 'G', G,
+ 'H', H, 'I', I, 'J', J, 'K', K, 'L', L, 'M', M, 'N', N, 'O', O, 'P', P, 'Q', Q,
+ 'R', R, 'S', S, 'T', T, 'U', U, 'V', V, 'W', W, 'X', X, 'Y', Y, 'Z', Z)) t2 as col4, col5 ORDER BY col4 DESC")
<DBISparkResult>
SQL CREATE TABLE new_df AS
SELECT CONCAT(t1.col1, t1.col2) AS newcol, t1.col1, t1.col2, t2.col4, t2.col5
FROM df t1
LATERAL VIEW EXPLODE (map('A', A, 'B', B, 'C', C, 'D', D, 'E', E, 'F', F, 'G', G,
'H', H, 'I', I, 'J', J, 'K', K, 'L', L, 'M', M, 'N', N, 'O', O, 'P', P, 'Q', Q,
'R', R, 'S', S, 'T', T, 'U', U, 'V', V, 'W', W, 'X', X, 'Y', Y, 'Z', Z)) t2 as col4, col5 ORDER BY col4 DESC
ROWS Fetched: 0 [complete]
Changed: 0
Upvotes: 3