user24318
user24318

Reputation: 485

scraping text file in R

I am trying to extract some information from the following text file using R. I need the line "Processor: SPARC T5" which is under heading "Hardware Systems" and then under "Java EE AppServer & Database Server HW (SUT hardware)". I tried the following which matches the expression and gives me all the Processor information. I have 50 different text files like this and need to extract this information from all of them. How do I extract the Processor information just under the "Hardware Systems" and "Java EE AppServer & Database Server HW (SUT hardware)" heading.

a <-readLines("http://spec.org/jEnterprise2010/results/res2013q3/jEnterprise2010-20130904-00045.txt")
b <- grep("Processor:",a) 
c <- a[b]
c[1] "  Processor:         SPARC T5"         "  Processor:         Intel Xeon X5670"

Upvotes: 0

Views: 894

Answers (1)

MrFlick
MrFlick

Reputation: 206496

Well, you can narrow down the section by looking for the section header, then seeing where the section starts by seeing where the indentation stops and only search those lines. For example

sectionmarker <- "Java EE AppServer & Database Server HW (SUT hardware)"
s<-grep(sectionmarker, a, fixed=TRUE)
e<-grep("^\\S", a[-(1:s)])[1]
grep("Processor", a[(s+1):(s+e-1)], fixed=T, value=T)[1]
# [1] "  Processor:         SPARC T5"

Upvotes: 2

Related Questions