DavidLopezM
DavidLopezM

Reputation: 75

Remove NAs from data frame without deleting entire rows/columns

I'm analyzing some pilot data for an experiment where we are giving participants 60 pairs of auditory stimuli from a pool of 190 pairs to rate on a 4 point scale. I get a lot of missing values since the participants are rating different pairs each time.

I really don't care about which participant said what, I just need all the ratings for the same pair to be in the same row so I can perform a Light's Kappa test for inter-rater agreement on each pair in n with kappam.light (irr package).

Here is the head of my data for 15 participants, where n is the number of the pair and m is the participant:

> head(my.data)
   [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12] [,13] [,14] [,15]
1   NA    1   NA    1   NA   NA   NA   NA    2     2    NA    NA    NA     3    NA
2   NA    3   NA   NA    3   NA   NA   NA    3     3    NA    NA     4    NA     3
3   NA   NA    1   NA   NA    4   NA    1   NA    NA     1     3    NA    NA     3
4   NA   NA    2   NA    1   NA   NA    1   NA    NA    NA    NA    NA    NA    NA
5    1   NA   NA    1   NA   NA   NA    1   NA    NA     4     1    NA    NA    NA
6    2   NA   NA   NA    1   NA   NA   NA    1     3    NA    NA    NA     2    NA

The output I want (if possible) is the following:

   [,1] [,2] [,3] [,4] [,5] [,6]
1    1    1    2    2    3
2    3    3    3    3    4    3
3    1    4    1    1    3    3
4    2    1    1   
5    1    1    1    4    1  
6    2    1    1    3    2   

I'm not sure if R will allow varying row lengths in a data frame/matrix, but it would be great to get rid of as many missing values as possible so kappam.light won't just disregard the whole row.

Upvotes: 2

Views: 3289

Answers (3)

G. Grothendieck
G. Grothendieck

Reputation: 269461

If you don't mind leaving the all NA columns in m2 then the second line of code could be omitted:

m2 <- t(apply(m, 1, function(x) x[order(is.na(x))])) # sort NAs to end of ea row
m2[, !!colSums(!is.na(m2))] 

The last line could have alternately been: m2[, apply(m2, 2, function(x) any(!is.na(x)))]

The result is:

     [,1] [,2] [,3] [,4] [,5] [,6]
[1,]    1    1    2    2    3   NA
[2,]    3    3    3    3    4    3
[3,]    1    4    1    1    3    3
[4,]    2    1    1   NA   NA   NA
[5,]    1    1    1    4    1   NA
[6,]    2    1    1    3    2   NA

Note: We used this as the input, m:

m <-
structure(c(NA, NA, NA, NA, 1L, 2L, 1L, 3L, NA, NA, NA, NA, NA, 
NA, 1L, 2L, NA, NA, 1L, NA, NA, NA, 1L, NA, NA, 3L, NA, 1L, NA, 
1L, NA, NA, 4L, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 1L, 
1L, 1L, NA, 2L, 3L, NA, NA, NA, 1L, 2L, 3L, NA, NA, NA, 3L, NA, 
NA, 1L, NA, 4L, NA, NA, NA, 3L, NA, 1L, NA, NA, 4L, NA, NA, NA, 
NA, 3L, NA, NA, NA, NA, 2L, NA, 3L, 3L, NA, NA, NA), .Dim = c(6L, 
15L), .Dimnames = list(NULL, NULL))

Next time please provide your data in this form using dput.

Upvotes: 3

maloneypatr
maloneypatr

Reputation: 3622

Would something like this work?

# initialize empty data frame
datt <- data.frame()

library(plyr)

for(i in 1:nrow(my.data)){
    myd <- my.data[i, ]
    myd <- myd[, !is.na(myd)]
    names(myd) <- 1:length(myd)
    datt <- rbind.fill(datt, myd)
}

datt
  1 2 3  4  5  6
1 1 1 2  2  3 NA
2 3 3 3  3  4  3
3 1 4 1  1  3  3
4 2 1 1 NA NA NA
5 1 1 1  4  1 NA
6 2 1 1  3  2 NA

Upvotes: 2

shadow
shadow

Reputation: 22293

You can easily get rid of NA values in a list. On the other hand, both matrix and data.frame need to have constant row length. Here's one way to do this:

# list removing NA's
lst <- apply(my.data, 1, function(x) x[!is.na(x)])
# maximum lenght
ll <- max(sapply(lst, length))
# combine 
t(sapply(lst, function(x) c(x, rep(NA, ll-length(x)))))

Upvotes: 4

Related Questions