How to clean up data in R?

Question

Currently, my data is in the following format:

Var   year   Co-1      Co-2     Co-3  ...
A     2018     a         j        .  
A     2017     b         k        .
A     2016     c         l        .
B     2018     d         m        .
B     2017     e         n        .
B     2016     f         o        .
C     2018     g         p        .
C     2017     h         q        .
C     2016     i         r        .
.       .      .         .        .
.       .      .         .
.       .      .         .

I want to transform it to the following format:

Company   year    A       B       C
Co-1      2018    a       d       g
Co-1      2017    b       e       h
Co-1      2016    c       f       i
Co-2      2018    j       m       p 
Co-2      2017    k       n       q
Co-2      2016    l       o       r
Co-3      2018    .       .
Co-3      2017    .       .
Co-3      2016    .       .
.
.
.

Essentially the changes are:

Inserting the company name multiple times in the first column, one for each year (2018,17,16)
Making the variable in each column header be A, B, and C, rather than having multiple AAA,BBB,CCCs in the first column

By doing this, I want to be able to regress year vs. each of A, B, and C separately, while keeping the Company distinction for each data point, so I can group the data points by company in the finished graph.

Thank you so much!

Ronak Shah · Accepted Answer

Get the data in long format and then in wide but with different columns.

library(tidyr)

df %>%
  pivot_longer(cols = starts_with("Co")) %>%
  pivot_wider(names_from = Var, values_from = value)

# A tibble: 6 x 5
#   year name  A     B     C    
#      
#1  2018 Co-1  a     d     g    
#2  2018 Co-2  j     m     p    
#3  2017 Co-1  b     e     h    
#4  2017 Co-2  k     n     q    
#5  2016 Co-1  c     f     i    
#6  2016 Co-2  l     o     r

data

df <- structure(list(Var = structure(c(1L, 1L, 1L, 2L, 2L, 2L, 3L, 
3L, 3L), .Label = c("A", "B", "C"), class = "factor"), year = c(2018L, 
2017L, 2016L, 2018L, 2017L, 2016L, 2018L, 2017L, 2016L), `Co-1` = 
structure(1:9, .Label = c("a", "b", "c", "d", "e", "f", "g", "h", "i"), 
class = "factor"), `Co-2` = structure(1:9, .Label = c("j", 
"k", "l", "m", "n", "o", "p", "q", "r"), class = "factor")), class = "data.frame", 
row.names = c(NA, -9L))

How to clean up data in R?

Answers (2)

Related Questions