Replace values for columns in dataframe by external numeric vector

Let's suppose I have a data frame in R with binary entries for three variables (a, b and c)

library(dplyr)
df <- data.frame(a = rbinom(10, 1, 0.5), b = rbinom(10, 2, 0.3), c = rbinom(10, 4, 0.8))

df
   a b c
1  1 0 1
2  0 1 1
3  0 0 1
4  1 0 0
5  1 1 1
6  0 1 1
7  0 1 0
8  0 0 1
9  1 0 1
10 0 0 1

Then, I want to create an index considering the relative "presence" of each variable for all observations (rows), something like:

df2 <- 1/(colSums(df))

df2

  a     b     c 
0.250 0.250 0.125

Now, I want to return to df. For each column and for each observation if the variable has a value of 1, then replace the values by the ones in df2. Otherwise, if the original value is 0, then I want to keep it. I tried to perform a loop, but it didn't work well.

for(i in 1:ncol(df)){

  df[,i][df==1] <- df2[i]

} 

Error in [<-.data.frame(*tmp*, , i, value = c(0.25, 0, 0, 0.25, 0.25, : replacement has 30 rows, data has 10

Is there an alternative way to do that?

Upvotes: 1

Views: 91

Answers (3)

Jaap
Jaap

Reputation: 83215

Another option:

df2 <- data.frame(matrix(rep(1/(colSums(df)), nrow(df)),
                         byrow = TRUE, nrow = nrow(df)))

df2[df == 0] <- 0

which gives:

> df2
      a    b     c
1  0.25 0.00 0.125
2  0.00 0.25 0.125
3  0.00 0.00 0.125
4  0.25 0.00 0.000
5  0.25 0.25 0.125
6  0.00 0.25 0.125
7  0.00 0.25 0.000
8  0.00 0.00 0.125
9  0.25 0.00 0.125
10 0.00 0.00 0.125

Used data:

df <- structure(list(a = c(1L, 0L, 0L, 1L, 1L, 0L, 0L, 0L, 1L, 0L), 
                     b = c(0L, 1L, 0L, 0L, 1L, 1L, 1L, 0L, 0L, 0L),
                     c = c(1L, 1L, 1L, 0L, 1L, 1L, 0L, 1L, 1L, 1L)),
                .Names = c("a", "b", "c"), class = "data.frame", row.names = c(NA, -10L))

Upvotes: 2

Tino
Tino

Reputation: 2101

You could find the ones first, then overwrite them by multiplication. This however only works if you want to replace ones, whereas @Sotos approach works for all.

df_is_1 <- df==1
df[df_is_1] <- (df_is_1*df2)[df_is_1]

Upvotes: 1

Sotos
Sotos

Reputation: 51582

You can use mapply to do that, i.e.

mapply(function(x, y) replace(x, x==1, y), df, i1)
#where i1 <- 1/colSums(df)

which gives,

             a    b c
 [1,] 0.0000000 0.00 4
 [2,] 0.3333333 0.25 4
 [3,] 0.0000000 0.00 4
 [4,] 0.3333333 0.00 3
 [5,] 0.0000000 0.00 3
 [6,] 0.0000000 0.00 3
 [7,] 0.0000000 0.25 4
 [8,] 0.3333333 0.25 3
 [9,] 0.0000000 0.25 4
[10,] 0.0000000 0.00 2

Note Your df2 (my i1) values are different than mine as you did not use a set.seed to make the rbinom reproducible

Upvotes: 4

Related Questions