Reputation: 135
Say you have a data table that looks like this
I apologize for the poor formatting
name time code val
cat 2012 a2 2321
dog 1980 b2 424
cat 2101 c6 34
moose 2310 c6 5.6
dog 1972 h9 34
lion 1982 g8 12
I want it to get all of the unique codes so that the table is arranged like this
name time a2 b2 c6 h9 g8
cat 2012 2321 NA NA NA NA
dog 1980 NA 424 NA NA NA
cat 2101 NA NA 34 NA NA
moose 2310 NA NA 5.6 NA NA
dog 1972 NA NA NA 34 NA
lion 1982 NA NA NA NA 12
Could someone please describe how I would do this in R? You do not have to provide any code but some indication on how to do this would be nice. The data set I'm working with is much bigger and may contain many duplicates.
Any advice is appreciated
Thank you for reading
Upvotes: 1
Views: 194
Reputation: 4534
I'm just adding the worked example here for spread
as already mentioned by @Mako212:
library(tidyverse)
df <- tribble(
~name, ~time, ~code, ~val,
"cat", 2012, "a2", 2321,
"dog", 1980, "b2", 424,
"cat", 2101, "c6", 34,
"moose", 2310, "c6", 5.6,
"dog", 1972, "h9", 34,
"lion", 1982, "g8", 12)
df %>% spread(code, val)
#> # A tibble: 6 x 7
#> name time a2 b2 c6 g8 h9
#> * <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 cat 2012 2321 NA NA NA NA
#> 2 cat 2101 NA NA 34.0 NA NA
#> 3 dog 1972 NA NA NA NA 34
#> 4 dog 1980 NA 424 NA NA NA
#> 5 lion 1982 NA NA NA 12 NA
#> 6 moose 2310 NA NA 5.6 NA NA
If maintaining the row order in the example output is important then just add rownames_to_column()
before using spread
:
df %>%
rownames_to_column() %>%
spread(code, val)
#> # A tibble: 6 x 8
#> rowname name time a2 b2 c6 g8 h9
#> * <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 1 cat 2012 2321 NA NA NA NA
#> 2 2 dog 1980 NA 424 NA NA NA
#> 3 3 cat 2101 NA NA 34.0 NA NA
#> 4 4 moose 2310 NA NA 5.6 NA NA
#> 5 5 dog 1972 NA NA NA NA 34
#> 6 6 lion 1982 NA NA NA 12 NA
Upvotes: 1
Reputation: 7292
reshape2
makes these transformations pretty straightforward.
require(reshape2)
df1 <- structure(list(name = structure(c(1L, 2L, 1L, 4L, 2L, 3L), .Label = c("cat",
"dog", "lion", "moose"), class = "factor"), time = c(2012L, 1980L,
2101L, 2310L, 1972L, 1982L), code = structure(c(1L, 2L, 3L, 3L,
5L, 4L), .Label = c("a2", "b2", "c6", "g8", "h9"), class = "factor"),
val = c(2321, 424, 34, 5.6, 34, 12)), .Names = c("name",
"time", "code", "val"), class = "data.frame", row.names = c(NA,
-6L))
dcast(df1, name + time ~ code, value.var="val")
name time a2 b2 c6 g8 h9
1 cat 2012 2321 NA NA NA NA
2 cat 2101 NA NA 34.0 NA NA
3 dog 1972 NA NA NA NA 34
4 dog 1980 NA 424 NA NA NA
5 lion 1982 NA NA NA 12 NA
6 moose 2310 NA NA 5.6 NA NA
Or similarly with tidyr::spread
. tidyr
is a little more intuitive, using the format spread(data, <column name whose values will be used as headings>, values)
spread(df1,code,val)
Upvotes: 1