Reputation: 1294
I would like to use map()
from the purrr
package to iterate over a subset of variables of my data frame. Is there a standard and convenient approach that?
take the following example dataset:
library(data.table)
library(purrr)
dt <- data.table(id= c("Alpha 1","Alpha 2","Alpha 3","Beta 1"),
id2= c("gamma 1","gamma 2","gamma 3","Delta 1") ,
y = rnorm(4))
id id2 y
1: Alpha 1 gamma 1 -1.1184009
2: Alpha 2 gamma 2 0.4347047
3: Alpha 3 gamma 3 0.2318315
4: Beta 1 Delta 1 1.2640080
I would like to split my id
columns every time there is a space (" "). The final dataset should look like this.
id numberid id2 numberid2 y
1: Alpha 1 gamma 1 -1.45772675
2: Alpha 2 gamma 2 -1.07430118
3: Alpha 3 gamma 3 -0.53454071
4: Beta 1 Delta 1 -0.05854228
I know how to do this one column at the time:
dt_m <- dt%>%separate(id,
sep=" ", c("id","numberid"))
id numberid id2 y
1: Alpha 1 gamma 1 2.0789930
2: Alpha 2 gamma 2 -0.2528485
3: Alpha 3 gamma 3 0.1332267
4: Beta 1 Delta 1 1.9299524
But I would like to iterate this using map over a number of columns. Does anyone knows a convenient way to
iterate with map over a set of columns, returning a data frame
and using the columns both for indexing and as a character sting (to paste number"id" and number"id2")?
I have tried something like this but it produces an empty data frame
vars <- c("id","id2")
dt2 <- dt%>%map_df(vars,~separate(.x,sep=" ", c((.x), "number")))
thanks a lot for your help
Upvotes: 0
Views: 978
Reputation: 886938
An option with fread
from data.table
library(data.table)
nm1 <- names(dt)[1:2]
nm2 <- paste0('number', nm1)
nm3 <- c(rbind(nm1, nm2))
setnames(dt[, c(list(y), lapply(.SD, function(x)
fread(text = x))), .SDcols= nm1], c("y", nm3))[]
Upvotes: 1
Reputation: 34291
I think a more typical tidyverse approach in using separate()
would be too pivot to long format and separate and then pivot back to wide, but as you asked for a map()
solution you can do the following. Note also that you're using data.table which has different indexing behavior to a data frame or tibble.
library(data.table)
library(tidyverse)
vars <- c("id","id2")
imap(vars, ~separate(dt[, .x, with = FALSE], .x, sep=" ", c(.x, paste0("numberid", .y)))) %>%
bind_cols(dt[, setdiff(names(dt), vars), with = FALSE])
id numberid1 id2 numberid2 y
1: Alpha 1 gamma 1 -0.69201999
2: Alpha 2 gamma 2 -0.39839537
3: Alpha 3 gamma 3 -1.24125212
4: Beta 1 Delta 1 -0.02165367
Alternatively:
dt %>%
rowid_to_column() %>%
pivot_longer(-c(y, rowid)) %>%
separate(value, c("id", "number")) %>%
pivot_wider(names_from = name, values_from = c(id, number))
Upvotes: 1
Reputation: 388807
Use cSplit
which will allow you to do this for multiple columns in one go.
splitstackshape::cSplit(dt, c('id', 'id2'), sep = ' ')
# y id_1 id_2 id2_1 id2_2
#1: 0.4037779 Alpha 1 gamma 1
#2: -0.3753461 Alpha 2 gamma 2
#3: 0.8014951 Alpha 3 gamma 3
#4: -1.3539683 Beta 1 Delta 1
Upvotes: 3