TobKel
TobKel

Reputation: 1453

R Avoid rowwise() and looking for faster alternative

I want to merge two vectors into one dataset and integrate it with the function mutate as 5 new columns into the already existing dataset. Here is my example code:

vector1<-list(c("Reply","Reshare","Like","Share","Search"),c("Reply","Reshare","Like","Share","Search"),c("Reply","Reshare","Like","Share","Search"))
vector2<-list(c(1,2,6,3,4),c(3,7,9,2,4),c(5,2,8,4,0))

tibble(vector1=vector1,
       vector2=vector2)%>%
  rowwise()%>%
  mutate(vector2|> set_names(vector1)|> as.list()|> data.frame())

# A tibble: 3 x 7
# Rowwise: 
  vector1   vector2   Reply Reshare  Like Share Search
  <list>    <list>    <dbl>   <dbl> <dbl> <dbl>  <dbl>
1 <chr [5]> <dbl [5]>     1       2     6     3      4
2 <chr [5]> <dbl [5]>     3       7     9     2      4
3 <chr [5]> <dbl [5]>     5       2     8     4      0

This works quite well so far. However, I have a very large dataset and the rowwise() solution is very time consuming. If I omit the rowwise() function I get an error message.
I think the error is due to the fact that I transform the vectors as a list (as.list()). The mutate function for the data set does not seem to be able to handle this.
The rowwise() function should be omitted and only the code in the mutate function should be changed.
Can anyone help me and provide a faster solution?

Upvotes: 4

Views: 691

Answers (2)

Stefano Barbi
Stefano Barbi

Reputation: 3184

I suggest to use mapply

library(dplyr)
library(magrittr)

tibble(vector1=vector1,
       vector2=vector2) %>%
  mutate(mapply(set_names, vector2, vector1, SIMPLIFY = FALSE) %>%
         do.call(rbind, .) %>%
         data.frame())

# A tibble: 3 × 7
  vector1   vector2   Reply Reshare  Like Share Search
  <list>    <list>    <dbl>   <dbl> <dbl> <dbl>  <dbl>
1 <chr [5]> <dbl [5]>     1       2     6     3      4
2 <chr [5]> <dbl [5]>     3       7     9     2      4
3 <chr [5]> <dbl [5]>     5       2     8     4      0

Here is a benchmark that compares rowwise against mapply with vectors of length 100 and shuffled labels

vector1 <- replicate(sample(c("Reply","Reshare","Like","Share","Search"),
                            5,
                            replace = FALSE),
                     n = 100,
                     simplify = FALSE)

vector2 <- replicate(rnorm(5), n= 100, simplify = FALSE)


tb <- tibble(vector1 = vector1, vector2 = vector2)

microbenchmark(mapply = tb |>
                 mutate(mapply(set_names, vector2, vector1, SIMPLIFY = FALSE) %>%
                        do.call(rbind, .) |>
                        data.frame()),
               rowwise = tb %>%
                 rowwise()%>%
                 mutate(vector2|> set_names(vector1)|> as.list()|> data.frame()))

+ Unit: milliseconds
    expr       min        lq      mean    median        uq       max neval
  mapply  2.439877  2.487191  2.630114  2.512208  2.576073  5.990312   100
 rowwise 37.309123 37.775255 39.386047 38.193196 41.221624 44.088820   100

Upvotes: 0

Ronak Shah
Ronak Shah

Reputation: 388807

If vector1 has the same values (and in the same order) always like in the example we can do this in base R in a more simpler way.

do.call(rbind, vector2) |>
  as.data.frame() |>
  setNames(vector1[[1]])

#  Reply Reshare Like Share Search
#1     1       2    6     3      4
#2     3       7    9     2      4
#3     5       2    8     4      0

Upvotes: 1

Related Questions