user21538383
user21538383

Reputation:

Calculate intraclass correlation (ICC) on each column and return a dataframe with results

I have a dataframe based on a questionnaire, all participants answer the questionnaire 2 times. based on this a dataframe with all participants and questionnaire items is formed.

The dataframe looks like the following, (each row is a different participant (with its unique ID). and the item '_1' and '_2' represent data of Questionnaire 1 and Questionnaire 2 that all participants answered (twice). each item is an question, there are 20 items (questions)):

edited df specific for icc

df <- data.frame(matrix(NA, nrow = 20, ncol = 130))
# Add the column names
colnames(df) <- c(paste0("ID", rep(1:65, each = 2), "_", rep(1:2)))
# Fill the dataframe with random 1's and 0's
df[] <- sample(0:1, size = nrow(df) * ncol(df), replace = TRUE)
# Set the row names
row.names(df) <- paste0("item", 1:20)
# View the dataframe
df

From the data of the two filled in questionnaires per participant I am trying to calculate the ICC per item.

However, currently I can only perform the ICC on the dataframe as a whole instead of per item. I tried:

icc_items <- function(item, df) {
  iccc <- ICC(df[item])
  data.frame(
    model =iccc$Model,
    type = iccc$Type,
    lowerbound = iccc$"lower bound"
    upperbound = iccc$"upper bound"
    p = iccc$p.value,
    icc = iccc$ICC,
    f = iccc$F.value )}
icc_col_names <- grep("^item", names(df), value = TRUE)
icc_col_names_list <- split(icc_col_names, factor(gsub("_[1|2]$", "", icc_col_names), levels = unique(gsub("_[1|2]$", "", icc_col_names))))
icc_items_list <- lapply(icc_col_names_list, \(item)
                           icc_items(item, df))
icc_items_df <- do.call(rbind, icc_items_list)
icc_items_df

the above code originally was used in calculating a different test, but I adjusted it to fit the ICC, or at least I tried, but it gives me an error.

Upvotes: 1

Views: 824

Answers (1)

M--
M--

Reputation: 29237

Update:

Providing a solution for the new dataframe shared by OP.

Here, I am using psych::ICC which gives us multiple models. We can use dplyr::filter to only get certain Model(s). For example, %>% filter(Model == "Single_random_raters") or filter(Model %in% c("Average_raters_absolute", "Single_random_raters").

Explanation:

In this solution, I add Items as a column (rownames_to_column), then using pivot_longer I get the ID##_#Measure in the long format which then can be separated to ID and Measure using separate.

Then I convert ID to be integers by removing the word "ID" and using as.integer (psych::ICC needs the values to be numeric).

Next, I create a new column with Item and Measure which then will be used to split the data per OP's request to get one ICC per Item per Measure.

Using purrr::map I loop over each of the dataframes created by split to reshape them into wide format (only including ID and value (i.e. survey responses)). Then I calculate the ICC for each dataframe, and extract the results which is a dataframe with the information OP is seeking, calculated by psych::ICC function.

Model is included in results but as the rownames. I convert them to a column, and then using bind_rows I put all the ICCs into a single dataframe.

Finally, I select the desired column and assign cleaner names for the final output.

suppressMessages is used to suppress the warnings/messages that ICC function gives. And as_tibble is just a preference which can be neglected. filter mentioned above for getting only certain models should be added at the end (currently commented out). Item_Measure can be separated using tidyr::separate into two columns as well if needed.

library(dplyr)
library(tibble)
library(tidyr)
library(purrr)
library(psych)

suppressMessages(
df %>% 
  rownames_to_column("Item") %>% 
  pivot_longer(-Item) %>% 
  separate(name, into = c("ID", "Measure")) %>% 
  mutate(ID = as.integer(gsub("ID", "", ID))) %>% 
  unite("Item_m", c(Item, Measure), remove = FALSE) %>% 
  split(., list(.$Item, .$Measure), drop = TRUE, sep = "_") %>% 
  map(~pivot_wider(.x, id_cols = ID, names_from = Item_m, values_from = value)) %>% 
  map(~ICC(.x)$results) %>% 
  map(~rownames_to_column(.x, "Model")) %>% 
  bind_rows(.id = "Item_Measure") %>% 
  select(Item_Measure, Model, Type = type, 
         Lower_Bound = `lower bound`, Upper_Bound = `upper bound`, 
         p_value = p, ICC_value = ICC, F_value =`F`) %>% 
  as_tibble() # %>%
#  filter(Model %in% c("Average_raters_absolute", "Single_random_raters")
)
#> # A tibble: 240 x 8
#>    Item_Measure Model    Type  Lower_Bound Upper_Bound p_value ICC_value F_value
#>    <chr>        <chr>    <chr>       <dbl>       <dbl>   <dbl>     <dbl>   <dbl>
#>  1 item1_1      Single_~ ICC1      -0.731      -0.413    1.00     -0.595   0.254
#>  2 item1_1      Single_~ ICC2      -0.0521      0.0750   0.500     0       1    
#>  3 item1_1      Single_~ ICC3      -0.242       0.242    0.500     0       1    
#>  4 item1_1      Average~ ICC1k     -5.45       -1.41     1.00     -2.94    0.254
#>  5 item1_1      Average~ ICC2k     -0.110       0.140    0.500     0       1    
#>  6 item1_1      Average~ ICC3k     -0.639       0.390    0.500     0       1    
#>  7 item10_1     Single_~ ICC1      -0.731      -0.412    1.00     -0.595   0.254
#>  8 item10_1     Single_~ ICC2      -0.0521      0.0751   0.500     0       1    
#>  9 item10_1     Single_~ ICC3      -0.242       0.242    0.500     0       1    
#> 10 item10_1     Average~ ICC1k     -5.44       -1.40     1.00     -2.94    0.254
#> # i 230 more rows

New Dataset:

df <- data.frame(matrix(NA, nrow = 20, ncol = 130))
# Add the column names
colnames(df) <- c(paste0("ID", rep(1:65, each = 2), "_", rep(1:2)))
# Fill the dataframe with random 1's and 0's
df[] <- sample(0:1, size = nrow(df) * ncol(df), replace = TRUE)
# Set the row names
row.names(df) <- paste0("item", 1:20)


Original Answer:

Here, I subset the data to only get the columns that match an item name.

names(df)[startsWith(names(df), "item1_")]
[1] "item1_1" "item1_2"

Then I subset the date to only have those columns for icc calculations. The result then will be stored into a dataframe. I included every variable, but any variable that is not desired to be in the final results, can be commented out in the icc_item function. I also added a column for Item to include the name of it in the final results.

In my loop, I loop over the items by removing the counter (i.e. Item##_1 or Item##_2) after the underscore and only keeping the item name (i.e. Item##_). That way, we loop over the items not every column.

library(irr)
#> Loading required package: lpSolve

icc_item <- function(item, df) {
  items <- names(df)[startsWith(names(df), item)]
  iccc  <- icc(df[items])
  data.frame(
    Item       = gsub("_", "", item),
    Unit       = iccc$unit,
    Model      = iccc$model,
    Type       = iccc$type,
    Subjects   = iccc$subjects,
    Raters     = iccc$raters,
    ICC_Name   = iccc$icc.name,
    ICC_Value  = iccc$value,
    R_zero     = iccc$r0,
    f_value    = iccc$Fvalue,
    p_value    = iccc$p.value,
    Conf_level = iccc$conf.level,
    LowerBound = iccc$lbound,
    UpperBound = iccc$ubound
    )
    }

do.call(rbind, lapply(unique(gsub("_\\d+$", "_", names(df)[-1])), 
                      function(item) icc_item(item, df)))
Original Dataset:
df <- data.frame(matrix(0, nrow = 51, ncol = 41))
# Set the column names for the first column and items columns
colnames(df) <- c("ID", paste(rep(paste0("item", 1:20), each = 2), c("_1", "_2"), sep = ""))
# Fill the ID column with values 1 to 51
df$ID <- 1:51
# Fill the item columns with random 0's and 1's
set.seed(123) # Set seed for reproducibility
df[, 2:41] <- matrix(sample(c(0, 1), size = 20 * 2 * 51, replace = TRUE), ncol = 40)

Created on 2023-05-05 with reprex v2.0.2

Upvotes: 0

Related Questions