Alex
Alex

Reputation: 1294

map() to iterate over columns of dataframe

I would like to use map() from the purrr package to iterate over a subset of variables of my data frame. Is there a standard and convenient approach that? take the following example dataset:

library(data.table)
library(purrr)
dt <- data.table(id= c("Alpha 1","Alpha 2","Alpha 3","Beta 1"),
id2= c("gamma 1","gamma 2","gamma 3","Delta 1") ,
y = rnorm(4))
        id     id2          y
1: Alpha 1 gamma 1 -1.1184009
2: Alpha 2 gamma 2  0.4347047
3: Alpha 3 gamma 3  0.2318315
4:  Beta 1 Delta 1  1.2640080

I would like to split my id columns every time there is a space (" "). The final dataset should look like this.

      id numberid   id2 numberid2           y
1: Alpha        1 gamma         1 -1.45772675
2: Alpha        2 gamma         2 -1.07430118
3: Alpha        3 gamma         3 -0.53454071
4:  Beta        1 Delta         1 -0.05854228

I know how to do this one column at the time:

dt_m <- dt%>%separate(id,
         sep=" ", c("id","numberid"))
      id numberid     id2          y
1: Alpha        1 gamma 1  2.0789930
2: Alpha        2 gamma 2 -0.2528485
3: Alpha        3 gamma 3  0.1332267
4:  Beta        1 Delta 1  1.9299524

But I would like to iterate this using map over a number of columns. Does anyone knows a convenient way to

  1. iterate with map over a set of columns, returning a data frame

  2. and using the columns both for indexing and as a character sting (to paste number"id" and number"id2")?

I have tried something like this but it produces an empty data frame

vars <- c("id","id2")
dt2 <- dt%>%map_df(vars,~separate(.x,sep=" ", c((.x), "number")))

thanks a lot for your help

Upvotes: 0

Views: 978

Answers (3)

akrun
akrun

Reputation: 886938

An option with fread from data.table

library(data.table)
nm1 <- names(dt)[1:2]
nm2 <- paste0('number', nm1)
nm3 <- c(rbind(nm1, nm2))
setnames(dt[, c(list(y), lapply(.SD,  function(x) 
      fread(text = x))), .SDcols= nm1], c("y", nm3))[]

Upvotes: 1

lroha
lroha

Reputation: 34291

I think a more typical tidyverse approach in using separate() would be too pivot to long format and separate and then pivot back to wide, but as you asked for a map() solution you can do the following. Note also that you're using data.table which has different indexing behavior to a data frame or tibble.

library(data.table)
library(tidyverse)

vars <- c("id","id2")

imap(vars, ~separate(dt[, .x, with = FALSE], .x, sep=" ", c(.x, paste0("numberid", .y))))  %>%
  bind_cols(dt[, setdiff(names(dt), vars), with = FALSE])

      id numberid1   id2 numberid2           y
1: Alpha         1 gamma         1 -0.69201999
2: Alpha         2 gamma         2 -0.39839537
3: Alpha         3 gamma         3 -1.24125212
4:  Beta         1 Delta         1 -0.02165367

Alternatively:

dt %>%
  rowid_to_column() %>%
  pivot_longer(-c(y, rowid)) %>%
  separate(value, c("id", "number")) %>%
  pivot_wider(names_from = name, values_from = c(id, number))

Upvotes: 1

Ronak Shah
Ronak Shah

Reputation: 388807

Use cSplit which will allow you to do this for multiple columns in one go.

splitstackshape::cSplit(dt, c('id', 'id2'), sep = ' ')

#            y  id_1 id_2 id2_1 id2_2
#1:  0.4037779 Alpha    1 gamma     1
#2: -0.3753461 Alpha    2 gamma     2
#3:  0.8014951 Alpha    3 gamma     3
#4: -1.3539683  Beta    1 Delta     1

Upvotes: 3

Related Questions