Reputation: 49
Let's suppose I have a data frame in R with binary entries for three variables (a, b and c)
library(dplyr)
df <- data.frame(a = rbinom(10, 1, 0.5), b = rbinom(10, 2, 0.3), c = rbinom(10, 4, 0.8))
df
a b c
1 1 0 1
2 0 1 1
3 0 0 1
4 1 0 0
5 1 1 1
6 0 1 1
7 0 1 0
8 0 0 1
9 1 0 1
10 0 0 1
Then, I want to create an index considering the relative "presence" of each variable for all observations (rows), something like:
df2 <- 1/(colSums(df))
df2
a b c
0.250 0.250 0.125
Now, I want to return to df. For each column and for each observation if the variable has a value of 1, then replace the values by the ones in df2. Otherwise, if the original value is 0, then I want to keep it. I tried to perform a loop, but it didn't work well.
for(i in 1:ncol(df)){
df[,i][df==1] <- df2[i]
}
Error in
[<-.data.frame
(*tmp*
, , i, value = c(0.25, 0, 0, 0.25, 0.25, : replacement has 30 rows, data has 10
Is there an alternative way to do that?
Upvotes: 1
Views: 91
Reputation: 83215
Another option:
df2 <- data.frame(matrix(rep(1/(colSums(df)), nrow(df)),
byrow = TRUE, nrow = nrow(df)))
df2[df == 0] <- 0
which gives:
> df2 a b c 1 0.25 0.00 0.125 2 0.00 0.25 0.125 3 0.00 0.00 0.125 4 0.25 0.00 0.000 5 0.25 0.25 0.125 6 0.00 0.25 0.125 7 0.00 0.25 0.000 8 0.00 0.00 0.125 9 0.25 0.00 0.125 10 0.00 0.00 0.125
Used data:
df <- structure(list(a = c(1L, 0L, 0L, 1L, 1L, 0L, 0L, 0L, 1L, 0L),
b = c(0L, 1L, 0L, 0L, 1L, 1L, 1L, 0L, 0L, 0L),
c = c(1L, 1L, 1L, 0L, 1L, 1L, 0L, 1L, 1L, 1L)),
.Names = c("a", "b", "c"), class = "data.frame", row.names = c(NA, -10L))
Upvotes: 2
Reputation: 2101
You could find the ones first, then overwrite them by multiplication. This however only works if you want to replace ones, whereas @Sotos approach works for all.
df_is_1 <- df==1
df[df_is_1] <- (df_is_1*df2)[df_is_1]
Upvotes: 1
Reputation: 51582
You can use mapply
to do that, i.e.
mapply(function(x, y) replace(x, x==1, y), df, i1)
#where i1 <- 1/colSums(df)
which gives,
a b c [1,] 0.0000000 0.00 4 [2,] 0.3333333 0.25 4 [3,] 0.0000000 0.00 4 [4,] 0.3333333 0.00 3 [5,] 0.0000000 0.00 3 [6,] 0.0000000 0.00 3 [7,] 0.0000000 0.25 4 [8,] 0.3333333 0.25 3 [9,] 0.0000000 0.25 4 [10,] 0.0000000 0.00 2
Note Your df2
(my i1
) values are different than mine as you did not use a set.seed
to make the rbinom
reproducible
Upvotes: 4