ipuzek
ipuzek

Reputation: 13

Collapse variables into one variable with values preserving the variable names

After merging two data frames, I get this:

d.f <- data.frame(sex = c("M", "F", NA, NA),
       age = c(NA, NA, "old", "young"),
       n = c(2, 1, 3, 4))

...but would like to combine the first two variables into one, and end up with this:

sex.age      n
 sex.M       2
 sex.F       1
 age.old     3
 age.young   4

It seems like a simple (and probably common) problem, but I am really struggling to find the answer. I thought there is something ready made along the lines of tidyr and reshape, but I am not finding it - probably because the resulting structure is very untidy.

The data structure will stay the same, meaning that NA's are always in different rows. Solutions like

na.omit(unlist(d.f[1:2]))

or

apply(d.f[, 1:2], 1, function(x) x[!is.na(x)])

do work, but there is no easy way to get the variable names prefixes.

Upvotes: 0

Views: 652

Answers (2)

akrun
akrun

Reputation: 887128

Here is an option using dplyr/tidyr. We covert the 'wide' to 'long' format with gather, unite the columns 'Var' and 'Val' to create 'sex.age' and rearrange the column order.

library(dplyr)
library(tidyr)
gather(d.f, Var, Val, -n, na.rm = TRUE) %>% 
                 unite(sex.age, Var, Val, sep=".") %>%
                 select(2:1)
#    sex.age n
#1     sex.M 2
#2     sex.F 1
#7   age.old 3
#8 age.young 4

Upvotes: 0

dayne
dayne

Reputation: 7784

You can use melt from the reshape2 package.

library(reshape2)
df <- melt(d.f, id.vars = "n")
df[!is.na(df$value), ]
#   n variable value
# 1 2      sex     M
# 2 1      sex     F
# 7 3      age   old
# 8 4      age young

Upvotes: 1

Related Questions