Reputation: 57
If anyone mind lending some knowledge... What I am trying to do is make a new dataframe based on the below data frame values.
id value
ant 10
cat 4
cat 6
dog 5
dog 3
dog 2
fly 9
What I want to do next is, in sequential order I want to make a dataframe that looks like the following.
It should look like this
ant cat dog fly
0 0 0 0
0 0 0 0
0 0 0 0
0 0 0 0
0 1 0 0
0 1 1 0
0 1 1 0
0 1 1 0
0 1 0 0
0 1 0 NA
I know how to do this the long way by
newdf <- data.frame(matrix(2, ncol = length(unique(df[,"id"])) , nrow = 10))
newdf$X1[1:10] <- 0
newdf$X2[1:4] <- 0
newdf$X2[5:10] <- 1
...
However, is there any way to do this more efficiently? Note that my actual data will have roughly 50 rows so that's why I am looking for a more efficient way to complete this!
Upvotes: 3
Views: 146
Reputation: 388982
Here's a tidyverse
answer -
library(dplyr)
library(tidyr)
df %>%
group_by(id) %>%
mutate(val = rep(c(0, 1), length.out = n())) %>%
uncount(value) %>%
mutate(row = row_number()) %>%
complete(row = 1:10) %>%
pivot_wider(names_from = id, values_from = val) %>%
select(-row)
# ant cat dog fly
# <dbl> <dbl> <dbl> <dbl>
# 1 0 0 0 0
# 2 0 0 0 0
# 3 0 0 0 0
# 4 0 0 0 0
# 5 0 1 0 0
# 6 0 1 1 0
# 7 0 1 1 0
# 8 0 1 1 0
# 9 0 1 0 0
#10 0 1 0 NA
For each id
we assign an alternate 0, 1 value and use uncount
to repeat the rows based on the count. Get the data in wide format so that we have a separate column for each id
.
data
df <- structure(list(id = c("ant", "cat", "cat", "dog", "dog", "dog",
"fly"), value = c(10, 4, 6, 5, 3, 2, 9)), row.names = c(NA, -7L
), class = "data.frame")
Upvotes: 1
Reputation: 101337
You can try the following base R code
maxlen <- with(df, max(tapply(value, id, sum)))
list2DF(
lapply(
with(df, split(value, id)),
function(x) {
`length<-`(
rep(rep(c(0, 1), length.out = length(x)), x),
maxlen
)
}
)
)
which gives
ant cat dog fly
1 0 0 0 0
2 0 0 0 0
3 0 0 0 0
4 0 0 0 0
5 0 1 0 0
6 0 1 1 0
7 0 1 1 0
8 0 1 1 0
9 0 1 0 0
10 0 1 0 NA
Upvotes: 1