Edgar Alarcón
Edgar Alarcón

Reputation: 156

How convert contingency tables (counts) to individuals for GLM

I have my info as in this photograph:

enter image description here

You can download it here: https://drive.google.com/file/d/1pgO51NXtjpVSz-VxQEDNFFuQXVc4jVkt/view?usp=sharing

What i want is to transform this data to individuals,

For example

enter image description here

Will transform into this

enter image description here

Another example

enter image description here

will turn into this

enter image description here

So, if we say that n="sum of all numbers in the original data.frame", i.e., the number of all individuals, the final output will be a data.frame with 6 columns and n rows.

I want to do this in R but i don't have any idea how. Once I have this, what i want to do is apply a generalized linear model with family binomial and link = probit.

Now, this page can explain some of what I tried to do:

https://www.datanalytics.com/libro_r/la-funcion-melt-y-datos-en-formato-largo.html

Upvotes: 0

Views: 214

Answers (2)

ekoam
ekoam

Reputation: 8844

Try this

library(readxl)
library(dplyr)
library(tidyr)

df <- read_xls("byssinosis.xls", range = cell_rows(c(4L, NA_integer_)), col_names = FALSE)
raw_nms <- read_xls("byssinosis.xls", range = cell_rows(c(1L, 3L)), col_names = FALSE)

names(df) <- with(
  fill(as.data.frame(t(raw_nms)[, -2L]), V1, V2), # replace any missing value in V1 and V2 (i.e. row 1 and 3 in your excel) with the last observation carrired forward
  trimws(paste(V1, if_else(is.na(V2), "", V2))) # collapse these names into a single vector
)

df %>% 
  pivot_longer(contains(" "), names_to = c("Workplace", "byssinosis"), names_pattern = "(\\d+) (.+)") %>% 
  slice(inverse.rle(list(lengths = value, values = seq_along(value)))) %>% 
  select(-value)

Output

# A tibble: 5,419 x 6
   Employment Smoking Sex   Race  Workplace byssinosis
   <chr>      <chr>   <chr> <chr> <chr>     <chr>     
 1 <10        yes     M     W     1         yes       
 2 <10        yes     M     W     1         yes       
 3 <10        yes     M     W     1         yes       
 4 <10        yes     M     W     1         no        
 5 <10        yes     M     W     1         no        
 6 <10        yes     M     W     1         no        
 7 <10        yes     M     W     1         no        
 8 <10        yes     M     W     1         no        
 9 <10        yes     M     W     1         no        
10 <10        yes     M     W     1         no        
# ... with 5,409 more rows

Upvotes: 1

Edgar Alarc&#243;n
Edgar Alarc&#243;n

Reputation: 156

Okay... I have an answer, but... i was wondering if there exists any generalization. Here it goes:

library(readxl)
library(dplyr)

# Información original ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
byssinosis <- read_xls(path = "byssinosis.xls",range = "B4:K27",col_names = F)
names(byssinosis) <- c("Employment","Smoking","Sex","Race",
                       "W1y","W1n","W2y","W2n","W3y","W3n")
# View(byssinosis)

# Procesando la información a individuos ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
# Primero pasamos las columnas a una sola.
datos <- reshape2::melt(byssinosis)
# Separamos estas columnas en las dos características deseadas.
datos <- datos %>%
  mutate(Workplace = ifelse(variable %in% c("W1y", "W1n"),1,
                            ifelse(variable %in% c("W2y", "W2n"),2,3)),
         Byssinosis = ifelse(variable %in% c("W1y", "W2y", "W3y"),"yes","no"))
# Repetimos con base en value.
individuos=rep(seq_len(nrow(datos)),datos$value)
datos <- datos[individuos,]
# Nos quedamos solo las columnas deseadas
datos <- datos %>% select(-c(variable,value))
# View(datos)

# Comprobación ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
tabla <-
  table(datos) %>%
  as.data.frame() %>%  
  arrange(Employment, desc(Smoking), desc(Sex), desc(Race), Workplace, desc(Byssinosis))
# View(tabla)

Upvotes: 0

Related Questions