Reputation: 6755
I have this data.frame
set.seed(28100)
label_1 <- sample(c('first_col','second_col'), 10, replace = T)
dat <- data.frame(label_1,
value_1 = sample(1:100, 10, replace = T),
label_2 = sapply(label_1, FUN = function(x) ifelse(x == 'first_col', 'second_col', 'first_col')),
value_2 = sample(1:100, 10, replace = T))
head(dat)
label_1 value_1 label_2 value_2
1 first_col 88 second_col 84
2 first_col 40 second_col 30
3 first_col 98 second_col 32
4 second_col 80 first_col 64
5 first_col 34 second_col 43
6 second_col 52 first_col 10
which has an inconsistent ordering of two pairs of key/value columns. I want to reshape the same data into a long format data.frame, such as:
desired_dat <- data.frame(first_col = rep(NA, 10),
second_col = rep(NA, 10))
Would be suggested to use reshape2
or tidyr
for this problem? How exactly?
Upvotes: 1
Views: 718
Reputation: 42544
As of version v1.9.6 (on CRAN 19 Sep 2015), data.table
can melt()
multiple columns simultaneously. So this goes in one chain of data.table
expressions:
library(data.table)
as.data.table(dat)[, rn := .I][
, melt(.SD, measure.vars = patterns("label", "value"))][
, dcast(.SD, rn ~ value1)][, -"rn"]
first_col second_col 1: 88 84 2: 40 30 3: 98 32 4: 64 80 5: 34 43 6: 10 52 7: 23 85 8: 65 86 9: 4 35 10: 83 8
Upvotes: 1
Reputation: 60060
This is basically @SymbolixAU's solution, just translated to dplyr
:
# Create an ID for each row: probably not necessary but useful to check
dat <- dat %>%
mutate(id = row_number())
dat_long <- bind_rows(
dat %>% select(id, label = label_1, value = value_1),
dat %>% select(id, label = label_2, value = value_2)
)
output <- dat_long %>%
spread(label, value)
Upvotes: 1
Reputation: 4534
How about just using dplyr
(no need for tidyr
etc)?
library(dplyr)
dat %>% transmute(first_col = if_else(label_1 == "first_col", value_1, value_2),
second_col = if_else(label_2 == "second_col", value_2, value_1))
#> first_col second_col
#> 1 88 84
#> 2 40 30
#> 3 98 32
#> 4 64 80
#> 5 34 43
#> 6 10 52
#> 7 23 85
#> 8 65 86
#> 9 4 35
#> 10 83 8
Upvotes: 2
Reputation: 26248
I would do this using data.table
, although the same principals can be applied to the tidyverse
as well
library(data.table)
## Setting as a data.table, and adding an 'id' value to keep track of rows
setDT(dat)
dat[, id := .I]
## then 'rbinding' the _1 and _2 columns together, with common column names
dat2 <- rbindlist(
list(
dat[, .(id, label = label_1, value = value_1)],
dat[, .(id, label = label_2, value = value_2)]
)
)
## the reshaping from long to wide to give you your desired result
dcast(dat2, formula = id ~ label)
# id first_col second_col
# 1: 1 88 84
# 2: 2 40 30
# 3: 3 98 32
# 4: 4 64 80
# 5: 5 34 43
# 6: 6 10 52
# 7: 7 23 85
# 8: 8 65 86
# 9: 9 4 35
# 10: 10 83 8
Upvotes: 1
Reputation: 6755
This is a possible solution; but not the most elegant.
myFun <- function(label1, value1, label2, value2, which_label) {
return(ifelse(label1 == which_label, value1, value2))
}
desired_dat <-
data.frame(first_col = mapply(FUN = myFun, dat$label_1, dat$value_1, dat$label_2, dat$value_2, MoreArgs = list(which_label = 'first_col'), SIMPLIFY = TRUE),
second_col = mapply(FUN = myFun, dat$label_1, dat$value_1, dat$label_2, dat$value_2, MoreArgs = list(which_label = 'second_col'), SIMPLIFY = TRUE))
head(desired_dat)
first_col second_col
1 88 84
2 40 30
3 98 32
4 64 80
5 34 43
6 10 52
Upvotes: 0