Reputation: 111
UPDATED: I've been searching for related questions here and can't seem to find what I'm looking for. I need to flatten some nested data. My data frame currently looks something like this:
Col_A | Col B
red | 1
red | 2
red | 4
red | 5
blue | 2
blue | 2
blue | 3
green | 1
green | 2
green | 3
green | 3
green | 7
green | 9
black | 4
orange| 1
orange| 2
However, I would like it to look like this:
A | B | C | D | E | F | G |
red | 1 | 2 | 4 | 5 | NA| NA|
blue | 2 | 2 | 3 | NA| NA| NA|
green | 1 | 2 | 3 | 3 | 7 | 9 |
black | 4 | NA| NA| NA| NA| NA|
orange| 1 | 2 | NA| NA| NA| NA|
I tried using dplyr::group_by()
but apparently I don't understand it correctly as it does nothing to my data frame. Any ideas? I assume there's a very straightforward/simple function that does this for me. If not, I guess I can try to loop it out :(
Thanks in advance for your help!
Upvotes: 1
Views: 536
Reputation: 34703
Using data.table
's built in (efficient) equivalent to reshape2
's dcast
:
library(data.table) #1.9.5+, use dcast.data.table in earlier versions
setDT(x)
> dcast(x[, .(Col_B,1:.N), by=Col_A], Col_A~V2, value.var="Col_B")
Col_A 1 2 3 4 5 6
1: black 4 NA NA NA NA NA
2: blue 2 2 3 NA NA NA
3: green 1 2 3 3 7 9
4: orange 1 2 NA NA NA NA
5: red 1 2 4 5 NA NA
If Col_A
is already stored in your data.frame
as a factor with the proper level ordering, dcast
will preserve this order, otherwise we might specify:
x$Col_A<-factor(x$Col_A, levels=unique(x$Col_A))
setDT(x)
> dcast(x[, .(Col_B,1:.N), by=Col_A], Col_A~V2, value.var="Col_B")
Col_A 1 2 3 4 5 6
1: red 1 2 4 5 NA NA
2: blue 2 2 3 NA NA NA
3: green 1 2 3 3 7 9
4: black 4 NA NA NA NA NA
5: orange 1 2 NA NA NA NA
If you'd like the names to be as you wrote in your post, use setnames
:
setnames(dcast(x[,.(Col_B,1:.N),by=Col_A],
Col_A~V2,value.var="Col_B"),
LETTERS[1:7])[]
A B C D E F G
1: red 1 2 4 5 NA NA
2: blue 2 2 3 NA NA NA
3: green 1 2 3 3 7 9
4: black 4 NA NA NA NA NA
5: orange 1 2 NA NA NA NA
Upvotes: 1
Reputation: 4123
Solution using reshape2
package.
Add column with future column names and cast the data.frame to a new data.frame.
d <- data.frame(Col_A = rep(c("red", "blue", "green", "black", "orange"), c(4, 3, 6, 1, 2)),
Col_B = c(1:4, 1:3, 1:6, 1, 1:2))
d$L <- LETTERS[d$Col_B + 1]
reshape2::dcast(d, Col_A ~ L, value.var = "Col_B")
#output
Col_A B C D E F G
1 black 1 NA NA NA NA NA
2 blue 1 2 3 NA NA NA
3 green 1 2 3 4 5 6
4 orange 1 2 NA NA NA NA
5 red 1 2 3 4 NA NA
Upvotes: 1
Reputation: 78590
You're looking for spread
in the tidyr package. If your data looks like:
d <- data.frame(Col_A = rep(c("red", "blue", "green", "black", "orange"), c(4, 3, 6, 1, 2)),
Col_B = c(1:4, 1:3, 1:6, 1, 1:2))
Then you can do:
spread(d, Col_B, Col_B)
#> Col_A 1 2 3 4 5 6
#> 1 black 1 NA NA NA NA NA
#> 2 blue 1 2 3 NA NA NA
#> 3 green 1 2 3 4 5 6
#> 4 orange 1 2 NA NA NA NA
#> 5 red 1 2 3 4 NA NA
Note that yours is a slightly odd case for spread since you're using the same values to spread across the columns and to fill the values. It looks like you want the columns to be named based on letters. One approach to this is:
d %>%
mutate(letter = LETTERS[Col_B + 1]) %>%
spread(letter, Col_B)
#> Col_A B C D E F G
#> 1 black 1 NA NA NA NA NA
#> 2 blue 1 2 3 NA NA NA
#> 3 green 1 2 3 4 5 6
#> 4 orange 1 2 NA NA NA NA
#> 5 red 1 2 3 4 NA NA
The specifics of how you rename the columns, however, depends on your particular data.
Upvotes: 5