anonymous
anonymous

Reputation: 57

Change multiple values in a dataframe based on two other values

If anyone mind lending some knowledge... What I am trying to do is make a new dataframe based on the below data frame values.

id   value
ant    10
cat    4
cat    6
dog    5
dog    3
dog    2
fly    9

What I want to do next is, in sequential order I want to make a dataframe that looks like the following.

It should look like this

ant  cat  dog  fly
0    0    0    0
0    0    0    0
0    0    0    0
0    0    0    0
0    1    0    0
0    1    1    0
0    1    1    0
0    1    1    0
0    1    0    0
0    1    0    NA

I know how to do this the long way by

newdf <- data.frame(matrix(2, ncol = length(unique(df[,"id"])) , nrow = 10))
newdf$X1[1:10] <- 0
newdf$X2[1:4] <- 0
newdf$X2[5:10] <- 1
...

However, is there any way to do this more efficiently? Note that my actual data will have roughly 50 rows so that's why I am looking for a more efficient way to complete this!

Upvotes: 3

Views: 146

Answers (2)

Ronak Shah
Ronak Shah

Reputation: 388982

Here's a tidyverse answer -

library(dplyr)
library(tidyr)

df %>%
  group_by(id) %>%
  mutate(val = rep(c(0, 1), length.out = n())) %>%
  uncount(value) %>%
  mutate(row = row_number()) %>%
  complete(row = 1:10) %>%
  pivot_wider(names_from = id, values_from = val) %>%
  select(-row)

#     ant   cat   dog   fly
#   <dbl> <dbl> <dbl> <dbl>
# 1     0     0     0     0
# 2     0     0     0     0
# 3     0     0     0     0
# 4     0     0     0     0
# 5     0     1     0     0
# 6     0     1     1     0
# 7     0     1     1     0
# 8     0     1     1     0
# 9     0     1     0     0
#10     0     1     0    NA

For each id we assign an alternate 0, 1 value and use uncount to repeat the rows based on the count. Get the data in wide format so that we have a separate column for each id.

data

df <- structure(list(id = c("ant", "cat", "cat", "dog", "dog", "dog", 
"fly"), value = c(10, 4, 6, 5, 3, 2, 9)), row.names = c(NA, -7L
), class = "data.frame")

Upvotes: 1

ThomasIsCoding
ThomasIsCoding

Reputation: 101337

You can try the following base R code

maxlen <- with(df, max(tapply(value, id, sum)))
list2DF(
  lapply(
    with(df, split(value, id)),
    function(x) {
      `length<-`(
        rep(rep(c(0, 1), length.out = length(x)), x),
        maxlen
      )
    }
  )
)

which gives

   ant cat dog fly
1    0   0   0   0
2    0   0   0   0
3    0   0   0   0
4    0   0   0   0
5    0   1   0   0
6    0   1   1   0
7    0   1   1   0
8    0   1   1   0
9    0   1   0   0
10   0   1   0  NA

Upvotes: 1

Related Questions