Display name
Display name

Reputation: 4501

Reshape data with multiple header rows in R

                Julie                           Joe                
                Measurement 1   Measurement 2   Measurement 1   Measurement 2
Part Number 1   33              32              33              31
Part Number 2   34              31              33              32
Part Number 3   33              31              30              31

How do I get this messy Excel table into a tidy format for use in R? I can utilize readr functions like spread() and gather() but this seems to need a higher level of sophistication. Here's my best reconstruction of the Excel data frame for you to mess with. Please modify it at will to better make any point.

library(tidyverse)
messy <- data.frame(
  " " = c(" ", "Part Number 1", "Part Number 2", "Part Number 3"), 
  Julie = c("Measurement 1", 33, 34, 33),
  Julie = c("Measurement 2", 32, 31, 31),
  Joe = c("Measurement 1", 33, 33, 30),
  Joe = c("Measurement 2", 31, 32, 31))

I'm open to trying the dev versions of readr or tidyr. The new pack(), unpack() etc. functions seem to be useful. Also, it appears other SO attempts/solutions aren't quite what I'm looking for 1, 2, 3, 4.

Upvotes: 2

Views: 1593

Answers (1)

zack
zack

Reputation: 5415

The tricky part is getting the correct column names and then performing gather and separate, which it sounds like you're already familiar with. Normally I do not like selecting rows by number, but in this case I think it makes sense if you're reliably reading the data from Excel in this way.

note: If a person's name has a . in it, the last step would fail.

library(tidyverse)

less_messy <- messy %>%
  mutate_all(as.character) %>% 
  set_names(c("Part",
             paste(names(.)[2:ncol(.)],
                   .[1, 2:ncol(.)],
                   sep = "-"))) %>%
  `[`(2:nrow(.), ) %>%
  gather("key", "value", contains("Measurement")) %>%
  separate("key", c("person", "measurement"), sep = "-") %>%
  mutate_at("person", ~ stringr::str_replace(.x, "\\..*",""))

less_messy
#>             Part person   measurement value
#> 1  Part Number 1  Julie Measurement 1    33
#> 2  Part Number 2  Julie Measurement 1    34
#> 3  Part Number 3  Julie Measurement 1    33
#> 4  Part Number 1  Julie Measurement 2    32
#> 5  Part Number 2  Julie Measurement 2    31
#> 6  Part Number 3  Julie Measurement 2    31
#> 7  Part Number 1    Joe Measurement 1    33
#> 8  Part Number 2    Joe Measurement 1    33
#> 9  Part Number 3    Joe Measurement 1    30
#> 10 Part Number 1    Joe Measurement 2    31
#> 11 Part Number 2    Joe Measurement 2    32
#> 12 Part Number 3    Joe Measurement 2    31

data

changed marginally because I noticed a discrepancy between what you showed and your code for the measurement numbers:

messy <- data.frame(
  " " = c(" ", "Part Number 1", "Part Number 2", "Part Number 3"), 
  Julie = c("Measurement 1", 33, 34, 33),
  Julie = c("Measurement 2", 32, 31, 31),
  Joe = c("Measurement 1", 33, 33, 30),
  Joe = c("Measurement 2", 31, 32, 31))

Upvotes: 2

Related Questions