donkeyshark
donkeyshark

Reputation: 111

flatten nested data in R

UPDATED: I've been searching for related questions here and can't seem to find what I'm looking for. I need to flatten some nested data. My data frame currently looks something like this:

Col_A | Col B
red   | 1
red   | 2
red   | 4
red   | 5
blue  | 2
blue  | 2
blue  | 3
green | 1
green | 2
green | 3
green | 3
green | 7
green | 9
black | 4
orange| 1
orange| 2

However, I would like it to look like this:

A     | B | C | D | E | F | G |
red   | 1 | 2 | 4 | 5 | NA| NA|
blue  | 2 | 2 | 3 | NA| NA| NA|
green | 1 | 2 | 3 | 3 | 7 | 9 |
black | 4 | NA| NA| NA| NA| NA|
orange| 1 | 2 | NA| NA| NA| NA|

I tried using dplyr::group_by() but apparently I don't understand it correctly as it does nothing to my data frame. Any ideas? I assume there's a very straightforward/simple function that does this for me. If not, I guess I can try to loop it out :(

Thanks in advance for your help!

Upvotes: 1

Views: 536

Answers (3)

MichaelChirico
MichaelChirico

Reputation: 34703

Using data.table's built in (efficient) equivalent to reshape2's dcast:

library(data.table) #1.9.5+, use dcast.data.table in earlier versions
setDT(x)
> dcast(x[, .(Col_B,1:.N), by=Col_A], Col_A~V2, value.var="Col_B")
    Col_A 1  2  3  4  5  6
1: black  4 NA NA NA NA NA
2: blue   2  2  3 NA NA NA
3: green  1  2  3  3  7  9
4: orange 1  2 NA NA NA NA
5: red    1  2  4  5 NA NA

If Col_A is already stored in your data.frame as a factor with the proper level ordering, dcast will preserve this order, otherwise we might specify:

x$Col_A<-factor(x$Col_A, levels=unique(x$Col_A))
setDT(x)
> dcast(x[, .(Col_B,1:.N), by=Col_A], Col_A~V2, value.var="Col_B")
    Col_A 1  2  3  4  5  6
1: red    1  2  4  5 NA NA
2: blue   2  2  3 NA NA NA
3: green  1  2  3  3  7  9
4: black  4 NA NA NA NA NA
5: orange 1  2 NA NA NA NA

If you'd like the names to be as you wrote in your post, use setnames:

setnames(dcast(x[,.(Col_B,1:.N),by=Col_A],
               Col_A~V2,value.var="Col_B"),
         LETTERS[1:7])[]
        A B  C  D  E  F  G
1: red    1  2  4  5 NA NA
2: blue   2  2  3 NA NA NA
3: green  1  2  3  3  7  9
4: black  4 NA NA NA NA NA
5: orange 1  2 NA NA NA NA

Upvotes: 1

DrDom
DrDom

Reputation: 4123

Solution using reshape2 package. Add column with future column names and cast the data.frame to a new data.frame.

d <- data.frame(Col_A = rep(c("red", "blue", "green", "black", "orange"), c(4, 3, 6, 1, 2)), 
                Col_B = c(1:4, 1:3, 1:6, 1, 1:2))
d$L <- LETTERS[d$Col_B + 1]
reshape2::dcast(d, Col_A ~ L, value.var = "Col_B")

#output
   Col_A B  C  D  E  F  G
1  black 1 NA NA NA NA NA
2   blue 1  2  3 NA NA NA
3  green 1  2  3  4  5  6
4 orange 1  2 NA NA NA NA
5    red 1  2  3  4 NA NA

Upvotes: 1

David Robinson
David Robinson

Reputation: 78590

You're looking for spread in the tidyr package. If your data looks like:

d <- data.frame(Col_A = rep(c("red", "blue", "green", "black", "orange"), c(4, 3, 6, 1, 2)),
                Col_B = c(1:4, 1:3, 1:6, 1, 1:2))

Then you can do:

spread(d, Col_B, Col_B)
#>    Col_A 1  2  3  4  5  6
#> 1  black 1 NA NA NA NA NA
#> 2   blue 1  2  3 NA NA NA
#> 3  green 1  2  3  4  5  6
#> 4 orange 1  2 NA NA NA NA
#> 5    red 1  2  3  4 NA NA

Note that yours is a slightly odd case for spread since you're using the same values to spread across the columns and to fill the values. It looks like you want the columns to be named based on letters. One approach to this is:

d %>%
    mutate(letter = LETTERS[Col_B + 1]) %>%
    spread(letter, Col_B)
#>    Col_A B  C  D  E  F  G
#> 1  black 1 NA NA NA NA NA
#> 2   blue 1  2  3 NA NA NA
#> 3  green 1  2  3  4  5  6
#> 4 orange 1  2 NA NA NA NA
#> 5    red 1  2  3  4 NA NA

The specifics of how you rename the columns, however, depends on your particular data.

Upvotes: 5

Related Questions