Soran
Soran

Reputation: 391

R - create dynamic indicator columns from values in character columns

I have data that looks like this:

library(dplyr)

d<-data.frame(ID=c(1,1,2,3,3,4), Quality=c("Good", "Bad", "Ugly", "Good", "Good", "Ugly"), Area=c("East", "North", "North", "South", "East", "North"))

What I'd like to do is create one new column for each unique value in Quality and populate it with whether the ID matches that value and then aggregate the ID's. I want to do the same for Area.

This is what I have for when Quality == Good:

d$Quality.Good <- 0
d$Quality.Good[d$Quality=="Good"] <- 1

e <- d %>% 
      group_by(ID) %>%
      summarise(n=n(), MAX.Quality.Good = max(Quality.Good))
e

Output

A tibble: 4 x 3
 ID       MAX.Quality.Good
  <dbl>    <dbl>
1     1        1
2     2        0
3     3        1
4     4        0

Is it possible to build a function that will loop through each character column and build an indicator column for Good, Bad, Ugly, North, East, South instead of copy pasting the above many more times?

Here's where I'm stuck:

library(stringr)

#vector of each Quality
e <-d %>% 
  group_by(Quality) %>%
  summarise(n=n()) %>%
  select(Quality)
e<-as.data.frame(e)

#create new column names
f <- str_c(names(e),".",e[,1]) 

#initialize list of new columns
d[f] <- 0

#I'm stuck after this...

Thank you!

Upvotes: 1

Views: 316

Answers (2)

G. Grothendieck
G. Grothendieck

Reputation: 269654

1) Base R Create the model matrix for each column (using function make_mm) and bind them together as a data frame m. Finally aggregate on ID. No packages are used.

make_mm <- function(nm, data) model.matrix(~ . - 1, data[nm])
m <- do.call("data.frame", lapply(names(d)[-1], make_mm, d))
with(d, aggregate(. ~ ID, m, max))

giving:

  ID QualityBad QualityGood QualityUgly AreaEast AreaNorth AreaSouth
1  1          1           1           0        1         1         0
2  2          0           0           1        0         1         0
3  3          0           1           0        1         0         1
4  4          0           0           1        0         1         0

2) dplyr/purrr This could alternately be written as the following which is close to the code in the question but generalizes to all required columns. Note that here we make model data frames using make_md rather than making model matrices with make_mm. Also note that the dot in group_by(m, ID = .$ID) refers to d and not to m.

library(dplyr)
library(purrr)

make_md <- function(nm, data) {
  data %>% 
    select(nm) %>% 
    model.matrix(~ . - 1, .) %>% 
    as.data.frame
}

d %>% {
  m <- map_dfc(names(.)[-1], make_md, .)
  group_by(m, ID = .$ID) %>%
  summarize_all(max) %>%
  ungroup
}

Upvotes: 1

akrun
akrun

Reputation: 887158

We can do this in base R using table by replicating the 'ID' column by the number of columns of dataset minus 1, and pasteing the column names with the unlisted values (excluding the 'ID' column)

table(rep(d$ID, 2), paste0(names(d)[-1][col(d[-1])], unlist(d[-1])))
#       AreaEast AreaNorth AreaSouth QualityBad QualityGood QualityUgly
#  1        1         1         0          1           1           0
#  2        0         1         0          0           0           1
#  3        1         0         1          0           2           0
#  4        0         1         0          0           0           1

or with tidyverse, gather into 'long' format, unite the 'key', 'val' columns to a single column, get the distinct rows, and spread into 'wide' format after creating a column of 1s.

library(tidyverse)
gather(d, key, val, -ID) %>%
   unite(kv, key, val) %>% 
   distinct %>%
   mutate(n = 1) %>% 
   spread(kv, n, fill = 0)
#ID Area_East Area_North Area_South Quality_Bad Quality_Good Quality_Ugly
#1  1         1          1          0           1            1            0
#2  2         0          1          0           0            0            1
#3  3         1          0          1           0            1            0
#4  4         0          1          0           0            0            1

Upvotes: 1

Related Questions