CptNemo
CptNemo

Reputation: 6755

Wide to long data.frame merging two pairs (key/value) of columns

I have this data.frame

set.seed(28100)
label_1 <- sample(c('first_col','second_col'), 10, replace = T)
dat <- data.frame(label_1,
                  value_1 = sample(1:100, 10, replace = T),
                  label_2 = sapply(label_1, FUN = function(x) ifelse(x == 'first_col', 'second_col', 'first_col')),
                  value_2 = sample(1:100, 10, replace = T))

head(dat)
         label_1 value_1    label_2 value_2
1  first_col      88 second_col      84
2  first_col      40 second_col      30
3  first_col      98 second_col      32
4 second_col      80  first_col      64
5  first_col      34 second_col      43
6 second_col      52  first_col      10

which has an inconsistent ordering of two pairs of key/value columns. I want to reshape the same data into a long format data.frame, such as:

desired_dat <- data.frame(first_col = rep(NA, 10), 
                          second_col = rep(NA, 10))

Would be suggested to use reshape2 or tidyr for this problem? How exactly?

Upvotes: 1

Views: 718

Answers (5)

Uwe
Uwe

Reputation: 42544

As of version v1.9.6 (on CRAN 19 Sep 2015), data.table can melt() multiple columns simultaneously. So this goes in one chain of data.table expressions:

library(data.table)
as.data.table(dat)[, rn := .I][
  , melt(.SD, measure.vars = patterns("label", "value"))][
    , dcast(.SD, rn ~ value1)][, -"rn"]
    first_col second_col
 1:        88         84
 2:        40         30
 3:        98         32
 4:        64         80
 5:        34         43
 6:        10         52
 7:        23         85
 8:        65         86
 9:         4         35
10:        83          8

Upvotes: 1

Marius
Marius

Reputation: 60060

This is basically @SymbolixAU's solution, just translated to dplyr:

# Create an ID for each row: probably not necessary but useful to check
dat <- dat %>%
    mutate(id = row_number())

dat_long <- bind_rows(
    dat %>% select(id, label = label_1, value = value_1),
    dat %>% select(id, label = label_2, value = value_2)
)

output <- dat_long %>%
    spread(label, value)

Upvotes: 1

markdly
markdly

Reputation: 4534

How about just using dplyr (no need for tidyr etc)?

library(dplyr)
dat %>% transmute(first_col = if_else(label_1 == "first_col", value_1, value_2),
                  second_col = if_else(label_2 == "second_col", value_2, value_1))

#>    first_col second_col
#> 1         88         84
#> 2         40         30
#> 3         98         32
#> 4         64         80
#> 5         34         43
#> 6         10         52
#> 7         23         85
#> 8         65         86
#> 9          4         35
#> 10        83          8

Upvotes: 2

SymbolixAU
SymbolixAU

Reputation: 26248

I would do this using data.table, although the same principals can be applied to the tidyverse as well

library(data.table)

## Setting as a data.table, and adding an 'id' value to keep track of rows
setDT(dat)
dat[, id := .I]


## then 'rbinding' the _1 and _2 columns together, with common column names
dat2 <- rbindlist(
    list(
        dat[, .(id, label = label_1, value = value_1)], 
        dat[, .(id, label = label_2, value = value_2)]
        )
)

## the reshaping from long to wide to give you your desired result
dcast(dat2, formula = id ~ label)
#     id first_col second_col
# 1:   1        88         84
# 2:   2        40         30
# 3:   3        98         32
# 4:   4        64         80
# 5:   5        34         43
# 6:   6        10         52
# 7:   7        23         85
# 8:   8        65         86
# 9:   9         4         35
# 10: 10        83          8

Upvotes: 1

CptNemo
CptNemo

Reputation: 6755

This is a possible solution; but not the most elegant.

myFun <- function(label1, value1, label2, value2, which_label) {
  return(ifelse(label1 == which_label, value1, value2))
}

desired_dat <- 
  data.frame(first_col = mapply(FUN = myFun, dat$label_1, dat$value_1, dat$label_2, dat$value_2, MoreArgs = list(which_label = 'first_col'), SIMPLIFY = TRUE), 
             second_col = mapply(FUN = myFun, dat$label_1, dat$value_1, dat$label_2, dat$value_2, MoreArgs = list(which_label = 'second_col'), SIMPLIFY = TRUE))

head(desired_dat)


first_col second_col
1        88         84
2        40         30
3        98         32
4        64         80
5        34         43
6        10         52

Upvotes: 0

Related Questions