user3409240
user3409240

Reputation: 23

how to change NA-s to empty cells in R

I have a data frame with varying number of columns (depending on the year I have fewer or more data points). Originally this is a cross-sectional time series long dataset rather than a wide dataset but I need to pull out a vector for each year from it (and I would like to create country tables).

At the moment R puts NAs at the end of the rows if I have fewer data points (which means that some of the end columns have NA-s).

However I would like to use each row as an input vector in a Python code that does not like NAs. So I would like to replace the NAs with empty cells. It would be ideal to have different length vectors. Replacing the NAs with zeros does not work either since I would like to keep track of the different row sizes for different years. I have found answers for characters but I have numbers, any help would be appreciated. The goal is to write a table or csv file without the NA-s, as I would like to pass each row in a python code.
Thank you!

 mat1 <- matrix(c(3,0, 1, 13, NA, NA,NA, 3, 0, 1, 13, 
                  NA, NA, NA, 3, 0 ,1 ,16, NA, NA, NA,
                  3,0, 1, 16, NA, NA, NA, 0, 0, 134, 33, 39, 1, 14,    
                  0,0, 134, 33, 39, 1, 14),7,6)
print(t(mat1))
     [,1] [,2] [,3] [,4] [,5] [,6] [,7]
 [1,]    3    0    1   13   NA   NA   NA
 [2,]    3    0    1   13   NA   NA   NA
 [3,]    3    0    1   16   NA   NA   NA
 [4,]    3    0    1   16   NA   NA   NA
 [5,]    0    0  134   33   39    1   14
 [6,]    0    0  134   33   39    1   14

As a data.frame:

> print(as.data.frame(t(mat1)))
 >    V1 V2 V3 V4 V5 V6 V7
 > 1  3  0   1 13 NA NA NA
 > 2  3  0   1 13 NA NA NA
 > 3  3  0   1 16 NA NA NA
 > 4  3  0   1 16 NA NA NA
 > 5  0  0 134 33 39  1 14
 > 6  0  0 134 33 39  1 14

Upvotes: 0

Views: 5601

Answers (1)

Ben Bolker
Ben Bolker

Reputation: 226577

Depending on how you're passing the rows to Python code, there are a variety of ways of handling this, but none of them correspond to "emptying cells" - an NA value is already (arguably) the best/most sensible way to code an empty cell in a rectangular array in R.

 mat1 <- matrix(c(3,0, 1, 13, NA, NA,NA, 3, 0, 1, 13, 
              NA, NA, NA, 3, 0 ,1 ,16, NA, NA, NA,
              3,0, 1, 16, NA, NA, NA, 0, 0, 134, 33, 39, 1, 14,    
              0,0, 134, 33, 39, 1, 14),nrow=7,ncol=6)
 mat2 <- t(mat1)  ## see below
 ## Your text description says that `NA` values come at the end
 ## of *rows*, but your  matrix has `NA` values at the end of 
 ## *columns*, so I've transposed the matrix.

Since your stated is goal is to

write a table or csv file without the NA-s

the correct answer (as hinted at by a now-deleted comment) is to use write.csv(...,na=""): from ?write.csv,

na: the string to use for missing values in the data.

More generally, if you wanted to pass rows to Python one at a time, you could use one of the following strategies:

  • use na.omit() to strip out NA values:
for (i in 1:nrow(mat2))
    call_my_python_code(na.omit(mat2[i,]))

or

apply(mat2,1,function(x) call_my_python_code(na.omit(x))
  • store the data as a list, either from the very beginning or by splitting it into a list (you still have to get rid of the NA values):
my_list <- split(mat2,row(mat2))
my_list <- lapply(my_list,na.omit)
lapply(my_list,call_my_python_code)
  • store the data in long format and use plyr or dplyr tools to operate on chunks ...
library(reshape2)
mat3 <- na.omit(melt(mat2))
mat3[mat3$Var1==1,]  ## row 1
library(plyr)
dlply(mat3,"Var1",function(x) call_my_python_code(x$value))

Upvotes: 5

Related Questions