Reputation: 453
I have a dataset that is in a somewhat unfortunate structure:
Species site 2001 2002 2003
a 1 0 1 4
a 2 1 1 0
a 3 5 5 5
b 1 3 0 4
b 2 1 1 1
b 3 4 5 5
After trying for hours to get it in the correct format using R, I did it in Excel and transformed it to the format below.
ID a b
1_2001 0 3
1_2002 1 0
1_2003 4 4
2_2001 1 1
2_2002 1 1
2_2003 0 1
3_2001 5 4
3_2002 5 5
3_2004 5 5
The original dataset is rather large, and I can't let it rest that i don't know how to do this fast in R. Can someone explain to me how this transformation can be done in R?
Upvotes: 1
Views: 61
Reputation: 1258
Here another solution with gather
and spread
from tidyr-package
:
tibble::tibble(Species = c("a", "a", "a", "b", "b", "b"),
site = c(1L, 2L, 3L, 1L, 2L, 3L),
`2001` = c(0L, 1L, 5L, 3L, 1L, 4L),
`2002` = c(1L, 1L, 5L, 0L, 1L, 5L),
`2003` = c(4L, 0L, 5L, 4L, 1L, 5L)) %>%
tidyr::gather(-Species, -site, key = "key", value = "value") %>%
tidyr::spread(key = "Species", value = "value")
Output:
# A tibble: 9 x 4
site key a b
<int> <chr> <int> <int>
1 1 2001 0 3
2 1 2002 1 0
3 1 2003 4 4
4 2 2001 1 1
5 2 2002 1 1
6 2 2003 0 1
7 3 2001 5 4
8 3 2002 5 5
9 3 2003 5 5
Upvotes: 1
Reputation: 16178
Using tidyr
and dplyr
, you can first reshape our year columns into a longer format, then use pivot_wider
to create "a" and "b" column, assemble "site" and "ID" and finally keep only desired columns:
library(tidyr)
library(dplyr)
df %>% pivot_longer(.,-c(Species, site), names_to = "ID", values_to = "val") %>%
pivot_wider(.,names_from = Species, values_from = val) %>%
rowwise() %>%
mutate(ID = paste(site,ID, sep = "_")) %>%
select(ID, a, b)
Source: local data frame [9 x 3]
Groups: <by row>
# A tibble: 9 x 3
ID a b
<chr> <int> <int>
1 1_2001 0 3
2 1_2002 1 0
3 1_2003 4 4
4 2_2001 1 1
5 2_2002 1 1
6 2_2003 0 1
7 3_2001 5 4
8 3_2002 5 5
9 3_2003 5 5
Data
structure(list(Species = c("a", "a", "a", "b", "b", "b"), site = c(1L,
2L, 3L, 1L, 2L, 3L), `2001` = c(0L, 1L, 5L, 3L, 1L, 4L), `2002` = c(1L,
1L, 5L, 0L, 1L, 5L), `2003` = c(4L, 0L, 5L, 4L, 1L, 5L)), row.names = c(NA,
-6L), class = c("data.table", "data.frame"), .internal.selfref = <pointer: 0x56276b4f1350>)
Upvotes: 2