Reputation: 75
I'm analyzing some pilot data for an experiment where we are giving participants 60 pairs of auditory stimuli from a pool of 190 pairs to rate on a 4 point scale. I get a lot of missing values since the participants are rating different pairs each time.
I really don't care about which participant said what, I just need all the ratings for the same pair to be in the same row so I can perform a Light's Kappa test for inter-rater agreement on each pair in n with kappam.light (irr package).
Here is the head of my data for 15 participants, where n is the number of the pair and m is the participant:
> head(my.data)
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12] [,13] [,14] [,15]
1 NA 1 NA 1 NA NA NA NA 2 2 NA NA NA 3 NA
2 NA 3 NA NA 3 NA NA NA 3 3 NA NA 4 NA 3
3 NA NA 1 NA NA 4 NA 1 NA NA 1 3 NA NA 3
4 NA NA 2 NA 1 NA NA 1 NA NA NA NA NA NA NA
5 1 NA NA 1 NA NA NA 1 NA NA 4 1 NA NA NA
6 2 NA NA NA 1 NA NA NA 1 3 NA NA NA 2 NA
The output I want (if possible) is the following:
[,1] [,2] [,3] [,4] [,5] [,6]
1 1 1 2 2 3
2 3 3 3 3 4 3
3 1 4 1 1 3 3
4 2 1 1
5 1 1 1 4 1
6 2 1 1 3 2
I'm not sure if R will allow varying row lengths in a data frame/matrix, but it would be great to get rid of as many missing values as possible so kappam.light won't just disregard the whole row.
Upvotes: 2
Views: 3289
Reputation: 269461
If you don't mind leaving the all NA columns in m2 then the second line of code could be omitted:
m2 <- t(apply(m, 1, function(x) x[order(is.na(x))])) # sort NAs to end of ea row
m2[, !!colSums(!is.na(m2))]
The last line could have alternately been: m2[, apply(m2, 2, function(x) any(!is.na(x)))]
The result is:
[,1] [,2] [,3] [,4] [,5] [,6]
[1,] 1 1 2 2 3 NA
[2,] 3 3 3 3 4 3
[3,] 1 4 1 1 3 3
[4,] 2 1 1 NA NA NA
[5,] 1 1 1 4 1 NA
[6,] 2 1 1 3 2 NA
Note: We used this as the input, m
:
m <-
structure(c(NA, NA, NA, NA, 1L, 2L, 1L, 3L, NA, NA, NA, NA, NA,
NA, 1L, 2L, NA, NA, 1L, NA, NA, NA, 1L, NA, NA, 3L, NA, 1L, NA,
1L, NA, NA, 4L, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 1L,
1L, 1L, NA, 2L, 3L, NA, NA, NA, 1L, 2L, 3L, NA, NA, NA, 3L, NA,
NA, 1L, NA, 4L, NA, NA, NA, 3L, NA, 1L, NA, NA, 4L, NA, NA, NA,
NA, 3L, NA, NA, NA, NA, 2L, NA, 3L, 3L, NA, NA, NA), .Dim = c(6L,
15L), .Dimnames = list(NULL, NULL))
Next time please provide your data in this form using dput
.
Upvotes: 3
Reputation: 3622
Would something like this work?
# initialize empty data frame
datt <- data.frame()
library(plyr)
for(i in 1:nrow(my.data)){
myd <- my.data[i, ]
myd <- myd[, !is.na(myd)]
names(myd) <- 1:length(myd)
datt <- rbind.fill(datt, myd)
}
datt
1 2 3 4 5 6
1 1 1 2 2 3 NA
2 3 3 3 3 4 3
3 1 4 1 1 3 3
4 2 1 1 NA NA NA
5 1 1 1 4 1 NA
6 2 1 1 3 2 NA
Upvotes: 2
Reputation: 22293
You can easily get rid of NA
values in a list
. On the other hand, both matrix
and data.frame
need to have constant row length. Here's one way to do this:
# list removing NA's
lst <- apply(my.data, 1, function(x) x[!is.na(x)])
# maximum lenght
ll <- max(sapply(lst, length))
# combine
t(sapply(lst, function(x) c(x, rep(NA, ll-length(x)))))
Upvotes: 4