Reputation: 4501
Julie Joe
Measurement 1 Measurement 2 Measurement 1 Measurement 2
Part Number 1 33 32 33 31
Part Number 2 34 31 33 32
Part Number 3 33 31 30 31
How do I get this messy Excel table into a tidy format for use in R? I can utilize readr functions like spread()
and gather()
but this seems to need a higher level of sophistication. Here's my best reconstruction of the Excel data frame for you to mess with. Please modify it at will to better make any point.
library(tidyverse)
messy <- data.frame(
" " = c(" ", "Part Number 1", "Part Number 2", "Part Number 3"),
Julie = c("Measurement 1", 33, 34, 33),
Julie = c("Measurement 2", 32, 31, 31),
Joe = c("Measurement 1", 33, 33, 30),
Joe = c("Measurement 2", 31, 32, 31))
I'm open to trying the dev versions of readr or tidyr. The new pack()
, unpack()
etc. functions seem to be useful. Also, it appears other SO attempts/solutions aren't quite what I'm looking for 1, 2, 3, 4.
Upvotes: 2
Views: 1593
Reputation: 5415
The tricky part is getting the correct column names and then performing gather
and separate
, which it sounds like you're already familiar with. Normally I do not like selecting rows by number, but in this case I think it makes sense if you're reliably reading the data from Excel in this way.
note: If a person's name has a .
in it, the last step would fail.
library(tidyverse)
less_messy <- messy %>%
mutate_all(as.character) %>%
set_names(c("Part",
paste(names(.)[2:ncol(.)],
.[1, 2:ncol(.)],
sep = "-"))) %>%
`[`(2:nrow(.), ) %>%
gather("key", "value", contains("Measurement")) %>%
separate("key", c("person", "measurement"), sep = "-") %>%
mutate_at("person", ~ stringr::str_replace(.x, "\\..*",""))
less_messy
#> Part person measurement value
#> 1 Part Number 1 Julie Measurement 1 33
#> 2 Part Number 2 Julie Measurement 1 34
#> 3 Part Number 3 Julie Measurement 1 33
#> 4 Part Number 1 Julie Measurement 2 32
#> 5 Part Number 2 Julie Measurement 2 31
#> 6 Part Number 3 Julie Measurement 2 31
#> 7 Part Number 1 Joe Measurement 1 33
#> 8 Part Number 2 Joe Measurement 1 33
#> 9 Part Number 3 Joe Measurement 1 30
#> 10 Part Number 1 Joe Measurement 2 31
#> 11 Part Number 2 Joe Measurement 2 32
#> 12 Part Number 3 Joe Measurement 2 31
changed marginally because I noticed a discrepancy between what you showed and your code for the measurement numbers:
messy <- data.frame(
" " = c(" ", "Part Number 1", "Part Number 2", "Part Number 3"),
Julie = c("Measurement 1", 33, 34, 33),
Julie = c("Measurement 2", 32, 31, 31),
Joe = c("Measurement 1", 33, 33, 30),
Joe = c("Measurement 2", 31, 32, 31))
Upvotes: 2