Mark Miller
Mark Miller

Reputation: 13103

insert NA into data read as strings

I have a text file with unequal elements among rows. Sometimes the second column contains data, sometimes it contains NA and sometimes there is nothing recorded at all. I know that if there are only 4 elements in a row I should insert an NA as the element in the second column. However, I do not know how to do this. Here is an example data set:

abc.def ghi.jkl mno pqr A*
bc.def NA no qr A
c-e.ef non qrr AE
fg.gg no qr E
aa.bb cc.dd ee ff A*

Here is the desired result:

desired.result <- read.table(text = '
  Name1   Name2  Name3 Name4 Status
abc.def ghi.jkl    mno   pqr     A*
 bc.def      NA     no    qr      A
 c-e.ef      NA    non   qrr     AE
  fg.gg      NA     no    qr      E
  aa.bb   cc.dd     ee    ff     A*
', header = TRUE)

I have not gotten far, but I have been able to split the data and input it into a matrix with the following code. Of course, the data are misaligned.

setwd('c:/users/mmiller21/simple R programs')

my.data <- readLines('name_data.txt')

matrix(unlist(strsplit(unlist(my.data), " ")), ncol=5, byrow=TRUE)

#      [,1]      [,2]      [,3]  [,4]      [,5]     
# [1,] "abc.def" "ghi.jkl" "mno" "pqr"     "A*"     
# [2,] "bc.def"  "NA"      "no"  "qr"      "A"      
# [3,] "c-e.ef"  "non"     "qrr" "AE"      "fg.gg"  
# [4,] "no"      "qr"      "E"   "aa.bb"   "cc.dd"  
# [5,] "ee"      "ff"      "A*"  "abc.def" "ghi.jkl"

Somehow I should count the number of elements after using strsplit(unlist(my.data), " ") then insert NA as the second element in every row containing only four elements. Then input the data into a matrix. Thank you for any help with this. I prefer base R.

Upvotes: 0

Views: 137

Answers (4)

BrodieG
BrodieG

Reputation: 52637

Readlines, split by space characters, and append NA:

txt <- readLines(file)
t(sapply(strsplit(txt, "\\s+"), function(x) if(length(x) < 5) append(x, NA, 1) else x))
#      [,1]      [,2]      [,3]  [,4]  [,5]  
# [1,] "abc.def" "ghi.jkl" "mno" "pqr" "A*"
# [2,] "bc.def"  "NA"      "no"  "qr"  "A" 
# [3,] "c-e.ef"  NA        "non" "qrr" "AE"
# [4,] "fg.gg"   NA        "no"  "qr"  "E" 
# [5,] "aa.bb"   "cc.dd"   "ee"  "ff"  "A*"

Full version with data management:

file <- tempfile()
cat("abc.def ghi.jkl mno pqr A*
bc.def NA no qr A
c-e.ef non qrr AE
fg.gg no qr E
aa.bb cc.dd ee ff A*", "\n", sep="", file=file)
txt <- readLines(file)
t(sapply(strsplit(txt, "\\s+"), function(x) if(length(x) < 5) append(x, NA, 1) else x))
unlink(file)

Note this is similar to @Flodel

Upvotes: 1

IRTFM
IRTFM

Reputation: 263332

dat <- read.table(text="abc.def ghi.jkl mno pqr A*
 bc.def NA no qr A
 c-e.ef non qrr AE
 fg.gg no qr E
 aa.bb cc.dd ee ff A*", fill=TRUE, stringsAsFactors=FALSE)
 names(dat) <- c('Name1'  , 'Name2',  'Name3', 'Name4','Status')
 is.na(dat[[5]]) <- dat[[5]]==""   # set blanks in col 5 to NA
 t( apply( dat, 1, function(r) if( is.na( r[5]) ) {r[c(1,5,2:4)]}else {r}))
 #---------
     [,1]      [,2]      [,3]  [,4]  [,5]
[1,] "abc.def" "ghi.jkl" "mno" "pqr" "A*"
[2,] "bc.def"  NA        "no"  "qr"  "A" 
[3,] "c-e.ef"  NA        "non" "qrr" "AE"
[4,] "fg.gg"   NA        "no"  "qr"  "E" 
[5,] "aa.bb"   "cc.dd"   "ee"  "ff"  "A*"

Upvotes: 3

agstudy
agstudy

Reputation: 121568

You can use option fill=TRUE , then translate the missed rows:

dat <- read.table(text='abc.def ghi.jkl mno pqr A*
  bc.def NA no qr A
c-e.ef non qrr AE
fg.gg no qr E
aa.bb cc.dd ee ff A*',fill=TRUE)

t(apply(dat,1,function(x){
  if(nchar(x[5])==0)
    x= c(x[1],NA_character_,x[2:4])
  x
}))

   [,1]      [,2]      [,3]  [,4]  [,5]
[1,] "abc.def" "ghi.jkl" "mno" "pqr" "A*"
[2,] "bc.def"  NA        "no"  "qr"  "A" 
[3,] "c-e.ef"  NA        "non" "qrr" "AE"
[4,] "fg.gg"   NA        "no"  "qr"  "E" 
[5,] "aa.bb"   "cc.dd"   "ee"  "ff"  "A*"

Upvotes: 2

flodel
flodel

Reputation: 89057

Replace dat with your filename:

dat <- textConnection("abc.def ghi.jkl mno pqr A*
bc.def NA no qr A
c-e.ef non qrr AE
fg.gg no qr E
aa.bb cc.dd ee ff A*")

my.lines  <- readLines(dat)
my.rows   <- strsplit(my.lines, " ")
adjust    <- function(row) {
   if (length(row) == 4) c(head(row, 1), NA, tail(row, 3))
   else row
}
my.fixed  <- lapply(my.rows, adjust)

out <- matrix(unlist(my.fixed), ncol = 5, byrow = TRUE)
out[out == "NA"] <- NA

Upvotes: 2

Related Questions