Srikar Murali
Srikar Murali

Reputation: 135

Separate single column into multiple depending on unique values

Say you have a data table that looks like this

I apologize for the poor formatting

name  time  code val
cat   2012  a2   2321
dog   1980  b2   424
cat   2101  c6   34
moose 2310  c6   5.6
dog   1972  h9   34
lion  1982  g8   12

I want it to get all of the unique codes so that the table is arranged like this

name  time  a2   b2   c6   h9   g8
cat   2012  2321 NA   NA   NA   NA
dog   1980  NA   424  NA   NA   NA
cat   2101  NA   NA   34   NA   NA
moose 2310  NA   NA   5.6  NA   NA
dog   1972  NA   NA   NA   34   NA
lion  1982  NA   NA   NA   NA   12

Could someone please describe how I would do this in R? You do not have to provide any code but some indication on how to do this would be nice. The data set I'm working with is much bigger and may contain many duplicates.

Any advice is appreciated

Thank you for reading

Upvotes: 1

Views: 194

Answers (2)

markdly
markdly

Reputation: 4534

I'm just adding the worked example here for spread as already mentioned by @Mako212:

library(tidyverse)

df <- tribble(
~name,  ~time,  ~code, ~val,
"cat",   2012,  "a2",   2321,
"dog",   1980,  "b2",   424,
"cat",   2101,  "c6",   34,
"moose", 2310,  "c6",   5.6,
"dog",   1972,  "h9",   34,
"lion",  1982,  "g8",   12)

df %>% spread(code, val)
#> # A tibble: 6 x 7
#>    name  time    a2    b2    c6    g8    h9
#> * <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1   cat  2012  2321    NA    NA    NA    NA
#> 2   cat  2101    NA    NA  34.0    NA    NA
#> 3   dog  1972    NA    NA    NA    NA    34
#> 4   dog  1980    NA   424    NA    NA    NA
#> 5  lion  1982    NA    NA    NA    12    NA
#> 6 moose  2310    NA    NA   5.6    NA    NA

If maintaining the row order in the example output is important then just add rownames_to_column() before using spread:

df %>% 
  rownames_to_column() %>%
  spread(code, val) 
#> # A tibble: 6 x 8
#>   rowname  name  time    a2    b2    c6    g8    h9
#> *   <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1       1   cat  2012  2321    NA    NA    NA    NA
#> 2       2   dog  1980    NA   424    NA    NA    NA
#> 3       3   cat  2101    NA    NA  34.0    NA    NA
#> 4       4 moose  2310    NA    NA   5.6    NA    NA
#> 5       5   dog  1972    NA    NA    NA    NA    34
#> 6       6  lion  1982    NA    NA    NA    12    NA

Upvotes: 1

Mako212
Mako212

Reputation: 7292

reshape2 makes these transformations pretty straightforward.

require(reshape2)

df1 <- structure(list(name = structure(c(1L, 2L, 1L, 4L, 2L, 3L), .Label = c("cat", 
"dog", "lion", "moose"), class = "factor"), time = c(2012L, 1980L, 
2101L, 2310L, 1972L, 1982L), code = structure(c(1L, 2L, 3L, 3L, 
5L, 4L), .Label = c("a2", "b2", "c6", "g8", "h9"), class = "factor"), 
    val = c(2321, 424, 34, 5.6, 34, 12)), .Names = c("name", 
"time", "code", "val"), class = "data.frame", row.names = c(NA, 
-6L))

dcast(df1, name + time ~ code, value.var="val")

   name time   a2  b2   c6 g8 h9
1   cat 2012 2321  NA   NA NA NA
2   cat 2101   NA  NA 34.0 NA NA
3   dog 1972   NA  NA   NA NA 34
4   dog 1980   NA 424   NA NA NA
5  lion 1982   NA  NA   NA 12 NA
6 moose 2310   NA  NA  5.6 NA NA

Or similarly with tidyr::spread. tidyr is a little more intuitive, using the format spread(data, <column name whose values will be used as headings>, values)

spread(df1,code,val)

Upvotes: 1

Related Questions