Lloyd Christmas
Lloyd Christmas

Reputation: 1038

gather on first two rows

I have some poorly formatted data that I must work with. It contains two identifiers in the first two rows, followed by the data. The data looks like:

     V1       V2       V3
1  Date 12/16/18 12/17/18
2 Equip        a        b
3    x1        1        2
4    x2        3        4
5    x3        5        6

I want to gather the data to make it tidy, but gathering only works when you have single column names. I've tried looking at spreading as well. The only solutions I've come up with are very hacky and don't feel right. Is there an elegant way to deal with this?

Here's what I want:

      Date Equip metric value
1 12/16/18     a     x1     1
2 12/16/18     a     x2     3
3 12/16/18     a     x3     5
4 12/17/18     b     x1     2
5 12/17/18     b     x2     4
6 12/17/18     b     x3     6

This approach gets me close, but I don't know how to deal with the poor formatting (no header, no row names). It should be easy to gather if the formatting was proper.

> as.data.frame(t(df))
         V1    V2 V3 V4 V5
V1     Date Equip x1 x2 x3
V2 12/16/18     a  1  3  5
V3 12/17/18     b  2  4  6

And here's the dput

structure(list(V1 = c("Date", "Equip", "x1", "x2", "x3"), V2 = c("12/16/18", 
"a", "1", "3", "5"), V3 = c("12/17/18", "b", "2", "4", "6")), class = "data.frame", .Names = c("V1", 
"V2", "V3"), row.names = c(NA, -5L))

Upvotes: 8

Views: 442

Answers (3)

camille
camille

Reputation: 16832

Thanks for posting a nicely reproducible question. Here's some gentle tidyr/dplyr massaging.

library(tidyr)

df %>%
    gather(key = measure, value = value, -V1) %>%
    spread(key = V1, value = value) %>%
    dplyr::select(-measure) %>%
    gather(key = metric, value = value, x1:x3) %>%
    dplyr::arrange(Date, Equip, metric)
#>       Date Equip metric value
#> 1 12/16/18     a     x1     1
#> 2 12/16/18     a     x2     3
#> 3 12/16/18     a     x3     5
#> 4 12/17/18     b     x1     2
#> 5 12/17/18     b     x2     4
#> 6 12/17/18     b     x3     6

Updated for tidyr v1.0.0:

This is just a little bit cleaner syntax with the pivot functions.

df %>%
  pivot_longer(cols = -V1) %>%
  pivot_wider(names_from = V1) %>%
  pivot_longer(cols = matches("x\\d"), names_to = "metric") %>%
  dplyr::select(-name)

Upvotes: 6

BENY
BENY

Reputation: 323226

You can using reshape

library(reshape)
row.names(df) = df$V1
df$V1 = NULL
df = melt(data.frame(t(df)),id.var = c('Date','Equip'))
df[order(df$Date),]
      Date Equip variable value
1 12/16/18     a       x1     1
3 12/16/18     a       x2     3
5 12/16/18     a       x3     5
2 12/17/18     b       x1     2
4 12/17/18     b       x2     4
6 12/17/18     b       x3     6

Upvotes: 2

Calum You
Calum You

Reputation: 15062

Here's another way starting from your approach using t(). We can replace the headers from the first row and then drop the first row, allowing just a single gather which might be more intuitive.

library(tidyverse)
df <- structure(list(V1 = c("Date", "Equip", "x1", "x2", "x3"), V2 = c(
  "12/16/18",
  "a", "1", "3", "5"
), V3 = c("12/17/18", "b", "2", "4", "6")), class = "data.frame", .Names = c(
  "V1",
  "V2", "V3"
), row.names = c(NA, -5L))

df %>%
  t() %>%
  `colnames<-`(.[1, ]) %>%
  `[`(-1, ) %>%
  as_tibble() %>%
  gather("metric", "value", x1:x3) %>%
  arrange(Date, Equip, metric)
#> # A tibble: 6 x 4
#>   Date     Equip metric value
#>   <chr>    <chr> <chr>  <chr>
#> 1 12/16/18 a     x1     1    
#> 2 12/16/18 a     x2     3    
#> 3 12/16/18 a     x3     5    
#> 4 12/17/18 b     x1     2    
#> 5 12/17/18 b     x2     4    
#> 6 12/17/18 b     x3     6

Created on 2018-04-20 by the reprex package (v0.2.0).

Upvotes: 1

Related Questions