Reputation:
I have a dataframe based on a questionnaire, all participants answer the questionnaire 2 times. based on this a dataframe with all participants and questionnaire items is formed.
The dataframe looks like the following, (each row is a different participant (with its unique ID). and the item '_1' and '_2' represent data of Questionnaire 1 and Questionnaire 2 that all participants answered (twice). each item is an question, there are 20 items (questions)):
edited df specific for icc
df <- data.frame(matrix(NA, nrow = 20, ncol = 130))
# Add the column names
colnames(df) <- c(paste0("ID", rep(1:65, each = 2), "_", rep(1:2)))
# Fill the dataframe with random 1's and 0's
df[] <- sample(0:1, size = nrow(df) * ncol(df), replace = TRUE)
# Set the row names
row.names(df) <- paste0("item", 1:20)
# View the dataframe
df
From the data of the two filled in questionnaires per participant I am trying to calculate the ICC per item.
However, currently I can only perform the ICC on the dataframe as a whole instead of per item. I tried:
icc_items <- function(item, df) {
iccc <- ICC(df[item])
data.frame(
model =iccc$Model,
type = iccc$Type,
lowerbound = iccc$"lower bound"
upperbound = iccc$"upper bound"
p = iccc$p.value,
icc = iccc$ICC,
f = iccc$F.value )}
icc_col_names <- grep("^item", names(df), value = TRUE)
icc_col_names_list <- split(icc_col_names, factor(gsub("_[1|2]$", "", icc_col_names), levels = unique(gsub("_[1|2]$", "", icc_col_names))))
icc_items_list <- lapply(icc_col_names_list, \(item)
icc_items(item, df))
icc_items_df <- do.call(rbind, icc_items_list)
icc_items_df
the above code originally was used in calculating a different test, but I adjusted it to fit the ICC, or at least I tried, but it gives me an error.
Upvotes: 1
Views: 824
Reputation: 29237
Providing a solution for the new dataframe shared by OP.
Here, I am using psych::ICC
which gives us multiple models. We can use dplyr::filter
to only get certain Model(s). For example, %>% filter(Model == "Single_random_raters")
or filter(Model %in% c("Average_raters_absolute", "Single_random_raters")
.
Explanation:
In this solution, I add Items as a column (rownames_to_column
), then using pivot_longer
I get the ID##_#Measure in the long format which then can be separated to ID
and Measure
using separate
.
Then I convert ID
to be integers by removing the word "ID" and using as.integer
(psych::ICC
needs the values to be numeric).
Next, I create a new column with Item
and Measure
which then will be used to split
the data per OP's request to get one ICC per Item per Measure.
Using purrr::map
I loop over each of the dataframes created by split
to reshape them into wide format (only including ID
and value
(i.e. survey responses)). Then I calculate the ICC
for each dataframe, and extract the results
which is a dataframe with the information OP is seeking, calculated by psych::ICC
function.
Model
is included in results
but as the rownames
. I convert them to a column, and then using bind_rows
I put all the ICCs into a single dataframe.
Finally, I select the desired column and assign cleaner names for the final output.
suppressMessages
is used to suppress the warnings/messages that ICC
function gives. And as_tibble
is just a preference which can be neglected. filter
mentioned above for getting only certain models should be added at the end (currently commented out). Item_Measure
can be separated using tidyr::separate
into two columns as well if needed.
library(dplyr)
library(tibble)
library(tidyr)
library(purrr)
library(psych)
suppressMessages(
df %>%
rownames_to_column("Item") %>%
pivot_longer(-Item) %>%
separate(name, into = c("ID", "Measure")) %>%
mutate(ID = as.integer(gsub("ID", "", ID))) %>%
unite("Item_m", c(Item, Measure), remove = FALSE) %>%
split(., list(.$Item, .$Measure), drop = TRUE, sep = "_") %>%
map(~pivot_wider(.x, id_cols = ID, names_from = Item_m, values_from = value)) %>%
map(~ICC(.x)$results) %>%
map(~rownames_to_column(.x, "Model")) %>%
bind_rows(.id = "Item_Measure") %>%
select(Item_Measure, Model, Type = type,
Lower_Bound = `lower bound`, Upper_Bound = `upper bound`,
p_value = p, ICC_value = ICC, F_value =`F`) %>%
as_tibble() # %>%
# filter(Model %in% c("Average_raters_absolute", "Single_random_raters")
)
#> # A tibble: 240 x 8
#> Item_Measure Model Type Lower_Bound Upper_Bound p_value ICC_value F_value
#> <chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 item1_1 Single_~ ICC1 -0.731 -0.413 1.00 -0.595 0.254
#> 2 item1_1 Single_~ ICC2 -0.0521 0.0750 0.500 0 1
#> 3 item1_1 Single_~ ICC3 -0.242 0.242 0.500 0 1
#> 4 item1_1 Average~ ICC1k -5.45 -1.41 1.00 -2.94 0.254
#> 5 item1_1 Average~ ICC2k -0.110 0.140 0.500 0 1
#> 6 item1_1 Average~ ICC3k -0.639 0.390 0.500 0 1
#> 7 item10_1 Single_~ ICC1 -0.731 -0.412 1.00 -0.595 0.254
#> 8 item10_1 Single_~ ICC2 -0.0521 0.0751 0.500 0 1
#> 9 item10_1 Single_~ ICC3 -0.242 0.242 0.500 0 1
#> 10 item10_1 Average~ ICC1k -5.44 -1.40 1.00 -2.94 0.254
#> # i 230 more rows
df <- data.frame(matrix(NA, nrow = 20, ncol = 130))
# Add the column names
colnames(df) <- c(paste0("ID", rep(1:65, each = 2), "_", rep(1:2)))
# Fill the dataframe with random 1's and 0's
df[] <- sample(0:1, size = nrow(df) * ncol(df), replace = TRUE)
# Set the row names
row.names(df) <- paste0("item", 1:20)
Here, I subset the data to only get the columns that match an item name.
names(df)[startsWith(names(df), "item1_")]
[1] "item1_1" "item1_2"
Then I subset the date to only have those columns for icc
calculations. The result then will be stored into a dataframe. I included every variable, but any variable that is not desired to be in the final results, can be commented out in the icc_item
function. I also added a column for Item
to include the name of it in the final results.
In my loop, I loop over the items by removing the counter (i.e. Item##_1
or Item##_2
) after the underscore and only keeping the item name (i.e. Item##_
). That way, we loop over the items not every column.
library(irr)
#> Loading required package: lpSolve
icc_item <- function(item, df) {
items <- names(df)[startsWith(names(df), item)]
iccc <- icc(df[items])
data.frame(
Item = gsub("_", "", item),
Unit = iccc$unit,
Model = iccc$model,
Type = iccc$type,
Subjects = iccc$subjects,
Raters = iccc$raters,
ICC_Name = iccc$icc.name,
ICC_Value = iccc$value,
R_zero = iccc$r0,
f_value = iccc$Fvalue,
p_value = iccc$p.value,
Conf_level = iccc$conf.level,
LowerBound = iccc$lbound,
UpperBound = iccc$ubound
)
}
do.call(rbind, lapply(unique(gsub("_\\d+$", "_", names(df)[-1])),
function(item) icc_item(item, df)))
df <- data.frame(matrix(0, nrow = 51, ncol = 41))
# Set the column names for the first column and items columns
colnames(df) <- c("ID", paste(rep(paste0("item", 1:20), each = 2), c("_1", "_2"), sep = ""))
# Fill the ID column with values 1 to 51
df$ID <- 1:51
# Fill the item columns with random 0's and 1's
set.seed(123) # Set seed for reproducibility
df[, 2:41] <- matrix(sample(c(0, 1), size = 20 * 2 * 51, replace = TRUE), ncol = 40)
Created on 2023-05-05 with reprex v2.0.2
Upvotes: 0