Reputation: 2538
I have a file, file.txt, with data that looks like as follows,
<z:row ows_Req_Name1='John' ows_ReqPriority='High' ows_ReqDate='2012-10-10' />
<z:row ows_Req_Name1='Jack' ows_ReqPriority='Low' ows_ReqDate='2012-11-10' />
<z:row ows_Req_Name1='John' ows_ReqDate='2012-12-10' />
Not all lines contain all the required information. For eg., the last line above does not have an entry for ReqPriority like the other lines. I split the data into a dataframe using,
data.frame(do.call(rbind,strsplit(readLines('file.txt'),'ows_',fixed=T)))
but due to the missing entries in some of the lines the dataframe does not come out properly.
Any suggestions on how I can export this into a df and fill in the missing values with NA.
Req_Name1 ReqPriority ReqDate
John High 2012-10-10
Jack Low 2012-11-10
John NA 2012-10-10
Upvotes: 0
Views: 293
Reputation: 109844
flodel's response is better but I got to playing and used some regex (I'm a bit rusty with regex right now so this was good):
Read in yuor data:
x <- readLines(n=3)
<z:row ows_Req_Name1='John' ows_ReqPriority='High' ows_ReqDate='2012-10-10' />
<z:row ows_Req_Name1='Jack' ows_ReqPriority='Low' ows_ReqDate='2012-11-10' />
<z:row ows_Req_Name1='John' ows_ReqDate='2012-12-10' />
Reconstructing the data:
new <- lapply(strsplit(x, " ows_| />"), "[", -1)
new <- lapply(new, function(x) gsub("'", "", x))
tester <- function(x){
x[match(c("Req_", "ReqP", "ReqD"), substring(x, 1, 4))]
}
new2 <- lapply(lapply(new, tester), function(x){
gsub("^\\s+|\\s+$", "", gsub(".*=", " ", x))
})
DF <- data.frame(do.call(rbind, new2))
n <- lapply(lapply(new, tester), function(x){
na.omit(gsub("^\\s+|\\s+$", "", gsub("=.+.", " ", x)))
})
colnames(DF) <- n[[which.max(sapply(n, length))]]
DF
Outputs this:
Req_Name1 ReqPriority ReqDate
1 John High 2012-10-10
2 Jack Low 2012-11-10
3 John <NA> 2012-12-10
Upvotes: 2
Reputation: 89057
Since each row looks a lot like how data.frames are created in R, I thought is would be fun to work it this way:
x <- readLines('file.txt')
x <- gsub("<z:row (.*) />", "data.frame(\\1)", x)
x <- gsub("ows_", "", x)
x <- gsub(" ", ", ", x)
x
# [1] "data.frame(Req_Name1='John', ReqPriority='High', ReqDate='2012-10-10')"
# [2] "data.frame(Req_Name1='Jack', ReqPriority='Low', ReqDate='2012-11-10')"
# [3] "data.frame(Req_Name1='John', ReqDate='2012-12-10')"
library(plyr)
do.call(rbind.fill, lapply(x, function(z)eval(parse(text = z))))
# Req_Name1 ReqPriority ReqDate
# 1 John High 2012-10-10
# 2 Jack Low 2012-11-10
# 3 John <NA> 2012-12-10
But it should come with the usual warnings about using eval/parse
.
Upvotes: 3