Reputation: 279

too many NA values in dataset for na.omit to handle

I have a text file dataset that I read as follows:

cancer1 <- read.table("cancer.txt", stringsAsFactors = FALSE, quote='', header=TRUE,sep='\t')

I then have to convert the class of the constituent values so that I can perform mathematical analyses on the df.

cancer<-apply(cancer1,2, as.numeric)

This introduces >9000 NA values into a "17980 X 598" df. Hence there are too many NA values to just simply use "na.omit" as that just removes all of the rows....

Hence my plan is to replace each NA in each row with the mean value of that row, my attempt is as follows:

for(i in rownames(cancer)){
     cancer2<-replace(cancer, is.na(cancer), mean(cancer[i,]))
 }

However this removes every row just like na.omit:

dim(cancer2)
 [1]   0 598

Can someone tell me how to replace each of the NA values with the mean of that row?

Upvotes: 0

Answers (2)

johnny utah

Reputation: 279

sorted it out with code adapted from related post:

cancer1 <- read.table("TCGA_BRCA_Agilent_244K_microarray_genomicMatrix.txt", stringsAsFactors = FALSE, quote='' ,header=TRUE,sep='\t')
t<-cancer1[1:800, 1:400]
t<-t(t)
t<-apply(t,2, as.numeric) #constituents read as character strings need to be converted
                                     #to numerics
cM <- rowMeans(t, na.rm=TRUE)   #necessary subsequent data cleaning due to the
                                     #introduction of >1000 NA values- converted to the mean value of that row
indx <- which(is.na(t), arr.ind=TRUE)
t[indx] <- cM[indx[,2]]

Upvotes: 0

Hack-R

Reputation: 23214

You can use rowMeans with indexing.

k <- which(is.na(cancer1), arr.ind=TRUE)
cancer1[k] <- rowMeans(cancer1, na.rm=TRUE)[k[,1]]

Where k is an indices of the rows with NA values.

This works better than my original answer, which was:

for(i in 1:nrow(cancer1)){
  for(n in 1:ncol(cancer1)){
    if(is.na(cancer1[i,n])){
        cancer1[i,n]  <-  mean(t(cancer1[i,]), na.rm = T)# or  rowMeans(cancer1[i,], na.rm=T)
    }
    }
}

Upvotes: 2

too many NA values in dataset for na.omit to handle

Answers (2)

Related Questions