Reputation: 3
Currently, my data is in the following format:
Var year Co-1 Co-2 Co-3 ...
A 2018 a j .
A 2017 b k .
A 2016 c l .
B 2018 d m .
B 2017 e n .
B 2016 f o .
C 2018 g p .
C 2017 h q .
C 2016 i r .
. . . . .
. . . .
. . . .
I want to transform it to the following format:
Company year A B C
Co-1 2018 a d g
Co-1 2017 b e h
Co-1 2016 c f i
Co-2 2018 j m p
Co-2 2017 k n q
Co-2 2016 l o r
Co-3 2018 . .
Co-3 2017 . .
Co-3 2016 . .
.
.
.
Essentially the changes are:
By doing this, I want to be able to regress year vs. each of A, B, and C separately, while keeping the Company distinction for each data point, so I can group the data points by company in the finished graph.
Thank you so much!
Upvotes: 0
Views: 110
Reputation: 73692
I think it would be much easier to simply use factors here. Length of labels should be the number of unique companies, i.e. "Var"
values
df$Var <- factor(df$Var, labels=c("Pepsi", "Coke", "Sprite"))
names(df) <- c("company", "year", LETTERS[seq(names(df)[-(1:2)])])
Or in a single step:
df <- setNames(transform(df, Var=factor(Var, labels=c("Pepsi", "Coke", "Sprite"))),
c("company", "year", LETTERS[seq(names(df)[-(1:2)])]))
df
# company year A B C
# 1 Pepsi 2018 a j .
# 2 Pepsi 2017 b k .
# 3 Pepsi 2016 c l .
# 4 Coke 2018 d m .
# 5 Coke 2017 e n .
# 6 Coke 2016 f o .
# 7 Sprite 2018 g p .
# 8 Sprite 2017 h q .
# 9 Sprite 2016 i r .
Also produces cleaner classes:
sapply(df, class)
# company year A B C
# "factor" "integer" "character" "character" "character"
Upvotes: 1
Reputation: 389275
Get the data in long format and then in wide but with different columns.
library(tidyr)
df %>%
pivot_longer(cols = starts_with("Co")) %>%
pivot_wider(names_from = Var, values_from = value)
# A tibble: 6 x 5
# year name A B C
# <int> <chr> <fct> <fct> <fct>
#1 2018 Co-1 a d g
#2 2018 Co-2 j m p
#3 2017 Co-1 b e h
#4 2017 Co-2 k n q
#5 2016 Co-1 c f i
#6 2016 Co-2 l o r
data
df <- structure(list(Var = structure(c(1L, 1L, 1L, 2L, 2L, 2L, 3L,
3L, 3L), .Label = c("A", "B", "C"), class = "factor"), year = c(2018L,
2017L, 2016L, 2018L, 2017L, 2016L, 2018L, 2017L, 2016L), `Co-1` =
structure(1:9, .Label = c("a", "b", "c", "d", "e", "f", "g", "h", "i"),
class = "factor"), `Co-2` = structure(1:9, .Label = c("j",
"k", "l", "m", "n", "o", "p", "q", "r"), class = "factor")), class = "data.frame",
row.names = c(NA, -9L))
Upvotes: 2