CelineDion
CelineDion

Reputation: 1098

How do I take a subset of a column and make it into a new column with 0s instead of NAs for missing values in R?

Let's say I have a column of values and another column of factors:

    vals factors
 1:   58       B
 2:   42       B
 3:   64       A
 4:   64       A
 5:   26       B
 6:   64       A
 7:   20       A
 8:   20       A
 9:   22       A
10:   29       B
11:   60       B
12:   41       A
13:   79       A
14:   82       A
15:   11       A
16:   97       A
17:    1       B
18:   29       B
19:   90       B
20:    2       A

I want to create a new column for each factor with the vals value still in the corresponding row and a 0 for rows that that are for the other factors. So, to do this manually, it would look like this:

    vals factors     A
 1:   58       B     0
 2:   42       B     0
 3:   64       A    64
 4:   64       A    64
 5:   26       B     0
 6:   64       A    64
 7:   20       A    20
 8:   20       A    20
 9:   22       A    22
10:   29       B     0

and the same for all factors. How would I go about doing this? I tried the simplest way, which is to do:

dt$A <-  dt[factors == "A",]$vals

But that very predictably doesn't work:

Error in set(x, j = name, value = value) : 
  Supplied 12 items to be assigned to 20 items of column 'A'. If you wish to 'recycle' the RHS please use rep() to make this intent clear to readers of your code.

Upvotes: 0

Views: 34

Answers (2)

Onyambu
Onyambu

Reputation: 79358

in base R you would do:

cbind(df, as.data.frame.matrix(xtabs(vals~.,cbind(x=1:nrow(df), df))))

    vals factors  A  B
1:    58       B  0 58
2:    42       B  0 42
3:    64       A 64  0
4:    64       A 64  0
5:    26       B  0 26
6:    64       A 64  0
7:    20       A 20  0
8:    20       A 20  0
9:    22       A 22  0
10:   29       B  0 29
11:   60       B  0 60
12:   41       A 41  0
13:   79       A 79  0
14:   82       A 82  0
15:   11       A 11  0
16:   97       A 97  0
17:    1       B  0  1
18:   29       B  0 29
19:   90       B  0 90
20:    2       A  2  0

Another way in base R will be:

cbind(df, model.matrix(~0+factors, df)* df$vals)

    vals factors factorsA factorsB
1:    58       B        0       58
2:    42       B        0       42
3:    64       A       64        0
4:    64       A       64        0
5:    26       B        0       26
6:    64       A       64        0
7:    20       A       20        0
8:    20       A       20        0
9:    22       A       22        0
10:   29       B        0       29
11:   60       B        0       60
12:   41       A       41        0
13:   79       A       79        0
14:   82       A       82        0
15:   11       A       11        0
16:   97       A       97        0
17:    1       B        0        1
18:   29       B        0       29
19:   90       B        0       90
20:    2       A        2        0

Upvotes: 2

David Jorquera
David Jorquera

Reputation: 2102

Using dplyr you could mutate the new variables with an ifelse statement where if the name of the factor is equal to the name of the variable, then show the vals value, else, get a zero.

library(dplyr)

df <- df %>% mutate(A = ifelse(factors == "A", vals, 0),
                    B = ifelse(factors == "B", vals, 0))


   vals factors  A  B
1    58       B  0 58
2    42       B  0 42
3    64       A 64  0
4    64       A 64  0
5    26       B  0 26
6    64       A 64  0
7    20       A 20  0
8    20       A 20  0
9    22       A 22  0
10   29       B  0 29
11   60       B  0 60
12   41       A 41  0
13   79       A 79  0
14   82       A 82  0
15   11       A 11  0
16   97       A 97  0
17    1       B  0  1
18   29       B  0 29
19   90       B  0 90
20    2       A  2  0

Upvotes: 1

Related Questions