Reputation: 489

R - adding a count table in a dataframe

I would like to get a count table from several values in my dataframe, which are spread over several columns. In my case, several of my subjects have one or more types of medication. Now they are mentioned in several columns. I would instead like to have each medication name in a separate column, and a 1 or 0 (a count) per subject if they have it or not.

For example, my dataframe now looks like:

Pp  X           X2              X3              X4 
1   NA          NA              NA              NA 
2   trimipramin NA              NA              NA
3   Quetiapin   venlafaxin      escitalopram    NA 
4   duloxetin   amitriptylin    NA              NA
5   quetiapin   citalopram      escitalopram    Lithium 
6   NA          NA              NA              NA
7   duloxetin   escitalopram    venlafaxin      NA

I would like to it look like:

Pp  trimipramin escitalopram    quetiapin   venlafaxin  duloxetin   …
1   0           0               0           0           0           …
2   1           0               0           0           0           …
3   0           1               1           1           0           …
4   0           0               0           0           1           …   
5   0           1               1           0           0           …
6   0           0               0           0           0           …
7   0           1               0           1           1           …

Would it be possible to get such a count table added to my dataframe and not separately? I assume dplyr's spread() might be useful in this case, but I couldn't get it to work :-( Any help?

Upvotes: 0

Answers (3)

Benjamin Schlegel

Reputation: 527

I do not know if it is the easiest solution, but it works. First you have to get all medications in a vector. I do that with unlist and unqiue. Then you have to loop through them and and get the number of cases. I do that with the apply function and the sum of matches. As you want it in the same data.frame you can store it with data[[name]]. This solution works without any packages.

names = as.character(na.omit(unique(unlist(data))))
for(name in names){
  data[[name]] = apply(data, FUN = function(x) { sum(x == name, na.rm = T) }, 1)
}

Data

data = read.table(text = "X           X2              X3              X4 
NA          NA              NA              NA 
trimipramin NA              NA              NA
Quetiapin   venlafaxin      escitalopram    NA 
duloxetin   amitriptylin    NA              NA
quetiapin   citalopram      escitalopram    Lithium 
NA          NA              NA              NA
duloxetin   escitalopram    venlafaxin      NA", header = T, stringsAsFactors = F)

Upvotes: 1

Simon Larsen

Reputation: 742

The table function in base is a simple method for creating this type of binary indicator matrix.

library(tidyverse)

D %>%
  gather(key, drug, -Pp) %>%
  select(-key) %>%
  table

#>Pp  amitriptylin citalopram duloxetin escitalopram Lithium ...
#> 1             0          0         0            0       0 ... 
#> 2             0          0         0            0       0 ... 
#> 3             0          0         0            1       0 ... 
#> 4             1          0         1            0       0 ... 
#> 5             0          1         0            1       1 ... 
#> 6             0          0         0            0       0 ... 
#> 7             0          0         1            1       0 ...

You can also do it using spread instead, but this solution will remove patients without any associated drugs:

D %>%
  gather(key, value, -Pp) %>%
  filter(!is.na(value)) %>% 
  mutate(key=1) %>%
  spread(value, key, fill=0, drop = FALSE)

Upvotes: 2

A. Suliman

Reputation: 13125

library(dplyr)
library(tidyr)
library(reshape2)

data %>% gather(key = "Med", value = "Value", -Pp) %>% select(-Med) %>% 
         dcast(Pp~Value,fun.aggregate = function(x){as.integer(length(x) > 0)})

Data

data <- read.table(text="
        Pp  X           X2              X3              X4 
        1   NA          NA              NA              NA 
        2   trimipramin NA              NA              NA
        3   Quetiapin   venlafaxin      escitalopram    NA 
        4   duloxetin   amitriptylin    NA              NA
        5   quetiapin   citalopram      escitalopram    Lithium 
        6   NA          NA              NA              NA
        7   duloxetin   escitalopram    venlafaxin      NA",header=T, stringsAsFactors = F)

Upvotes: 1

R - adding a count table in a dataframe

Answers (3)

Data

Data

Related Questions