eyeOfTheStorm
eyeOfTheStorm

Reputation: 351

Match associated factors into new columns in R

Using the below data, I would like to make something generalized to convert a single column with factors into separate linked columns. Here the 'letters' column would create two new columns 'a' and 'b'.

data <- data.frame(letters = c("a", "a", "b", "c"),ints = c(1, 2, 1, 1), reals = c(.01, .22, .01, .02))


   letters ints reals
       a    1  0.01
       a    2  0.22
       b    1  0.01
       b    1  0.02

Would like to return something like this.

 ints   a    b
  1   .01   NA
  2   .22   NA
  1   NA   .01
  1   NA   .02

Is there a way to do this using tidytext for example? Or perhaps expanding on something like this would work:

model.matrix( ints ~ letters + reals, data = data )

Upvotes: 1

Views: 86

Answers (3)

shiny
shiny

Reputation: 3502

There should not be c in your letters variable as you showed in your data.frame.

You can do it using dplyr and tidyr if you have many letters

data1 <- data %>% 
  dplyr::mutate(id=seq_len(nrow(.))) %>% 
  tidyr::spread(letters, reals) %>% 
  dplyr::arrange(id)

> data1
  ints id    a    b
1    1  1 0.01   NA
2    2  2 0.22   NA
3    1  3   NA 0.01
4    1  4   NA 0.02

Or if you have only two letters a and b, you can also do it using ifelse.

library(dplyr)
data1 <- data %>% 
  dplyr::mutate(a = ifelse(letters=="a"&ints%in%c(1,2), reals, NA),
                b = ifelse(letters=="b"&ints==1, reals, NA))

> data1
  letters ints reals    a    b
1       a    1  0.01 0.01   NA
2       a    2  0.22 0.22   NA
3       b    1  0.01   NA 0.01
4       b    1  0.02   NA 0.02

Upvotes: 1

akrun
akrun

Reputation: 887501

We can do this with dcast from data.table

library(data.table)
dcast(setDT(data), ints +rowid(ints) ~ letters, value.var = "reals")[order(ints_1)
       ][, ints_1 := NULL][]
#   ints    a    b
#1:    1 0.01   NA
#2:    2 0.22   NA
#3:    1   NA 0.01
#4:    1   NA 0.02

Upvotes: 3

thelatemail
thelatemail

Reputation: 93908

I think this is what you want, assuming a typo in your example:

reshape(
  transform(data,id=seq_len(nrow(data))),
  idvar=c("id","ints"), timevar="letters", direction="wide"
)
#  ints id reals.a reals.b
#1    1  1    0.01      NA
#2    2  2    0.22      NA
#3    1  3      NA    0.01
#4    1  4      NA    0.02

Upvotes: 4

Related Questions