Reputation: 13103
I have a text file with unequal elements among rows. Sometimes the second column contains data, sometimes it contains NA
and sometimes there is nothing recorded at all. I know that if there are only 4 elements in a row I should insert an NA
as the element in the second column. However, I do not know how to do this. Here is an example data set:
abc.def ghi.jkl mno pqr A*
bc.def NA no qr A
c-e.ef non qrr AE
fg.gg no qr E
aa.bb cc.dd ee ff A*
Here is the desired result:
desired.result <- read.table(text = '
Name1 Name2 Name3 Name4 Status
abc.def ghi.jkl mno pqr A*
bc.def NA no qr A
c-e.ef NA non qrr AE
fg.gg NA no qr E
aa.bb cc.dd ee ff A*
', header = TRUE)
I have not gotten far, but I have been able to split the data and input it into a matrix
with the following code. Of course, the data are misaligned.
setwd('c:/users/mmiller21/simple R programs')
my.data <- readLines('name_data.txt')
matrix(unlist(strsplit(unlist(my.data), " ")), ncol=5, byrow=TRUE)
# [,1] [,2] [,3] [,4] [,5]
# [1,] "abc.def" "ghi.jkl" "mno" "pqr" "A*"
# [2,] "bc.def" "NA" "no" "qr" "A"
# [3,] "c-e.ef" "non" "qrr" "AE" "fg.gg"
# [4,] "no" "qr" "E" "aa.bb" "cc.dd"
# [5,] "ee" "ff" "A*" "abc.def" "ghi.jkl"
Somehow I should count the number of elements after using strsplit(unlist(my.data), " ")
then insert NA
as the second element in every row containing only four elements. Then input the data into a matrix. Thank you for any help with this. I prefer base R.
Upvotes: 0
Views: 137
Reputation: 52637
Readlines, split by space characters, and append NA:
txt <- readLines(file)
t(sapply(strsplit(txt, "\\s+"), function(x) if(length(x) < 5) append(x, NA, 1) else x))
# [,1] [,2] [,3] [,4] [,5]
# [1,] "abc.def" "ghi.jkl" "mno" "pqr" "A*"
# [2,] "bc.def" "NA" "no" "qr" "A"
# [3,] "c-e.ef" NA "non" "qrr" "AE"
# [4,] "fg.gg" NA "no" "qr" "E"
# [5,] "aa.bb" "cc.dd" "ee" "ff" "A*"
Full version with data management:
file <- tempfile()
cat("abc.def ghi.jkl mno pqr A*
bc.def NA no qr A
c-e.ef non qrr AE
fg.gg no qr E
aa.bb cc.dd ee ff A*", "\n", sep="", file=file)
txt <- readLines(file)
t(sapply(strsplit(txt, "\\s+"), function(x) if(length(x) < 5) append(x, NA, 1) else x))
unlink(file)
Note this is similar to @Flodel
Upvotes: 1
Reputation: 263332
dat <- read.table(text="abc.def ghi.jkl mno pqr A*
bc.def NA no qr A
c-e.ef non qrr AE
fg.gg no qr E
aa.bb cc.dd ee ff A*", fill=TRUE, stringsAsFactors=FALSE)
names(dat) <- c('Name1' , 'Name2', 'Name3', 'Name4','Status')
is.na(dat[[5]]) <- dat[[5]]=="" # set blanks in col 5 to NA
t( apply( dat, 1, function(r) if( is.na( r[5]) ) {r[c(1,5,2:4)]}else {r}))
#---------
[,1] [,2] [,3] [,4] [,5]
[1,] "abc.def" "ghi.jkl" "mno" "pqr" "A*"
[2,] "bc.def" NA "no" "qr" "A"
[3,] "c-e.ef" NA "non" "qrr" "AE"
[4,] "fg.gg" NA "no" "qr" "E"
[5,] "aa.bb" "cc.dd" "ee" "ff" "A*"
Upvotes: 3
Reputation: 121568
You can use option fill=TRUE
, then translate the missed rows:
dat <- read.table(text='abc.def ghi.jkl mno pqr A*
bc.def NA no qr A
c-e.ef non qrr AE
fg.gg no qr E
aa.bb cc.dd ee ff A*',fill=TRUE)
t(apply(dat,1,function(x){
if(nchar(x[5])==0)
x= c(x[1],NA_character_,x[2:4])
x
}))
[,1] [,2] [,3] [,4] [,5]
[1,] "abc.def" "ghi.jkl" "mno" "pqr" "A*"
[2,] "bc.def" NA "no" "qr" "A"
[3,] "c-e.ef" NA "non" "qrr" "AE"
[4,] "fg.gg" NA "no" "qr" "E"
[5,] "aa.bb" "cc.dd" "ee" "ff" "A*"
Upvotes: 2
Reputation: 89057
Replace dat
with your filename:
dat <- textConnection("abc.def ghi.jkl mno pqr A*
bc.def NA no qr A
c-e.ef non qrr AE
fg.gg no qr E
aa.bb cc.dd ee ff A*")
my.lines <- readLines(dat)
my.rows <- strsplit(my.lines, " ")
adjust <- function(row) {
if (length(row) == 4) c(head(row, 1), NA, tail(row, 3))
else row
}
my.fixed <- lapply(my.rows, adjust)
out <- matrix(unlist(my.fixed), ncol = 5, byrow = TRUE)
out[out == "NA"] <- NA
Upvotes: 2