Reputation: 649
I'm new to R and trying to generate a lot of graphs from one file, with headers between different data sets. I have a tab-delimited plaintext file, formatted like this:
Header: Boston city data
Month Data1 Data2 Data3
1 1.5 9.1342 8.1231
2 12.3 12.31 1.129
3 (etc...)
Header: Chicago city data
Month Data1 Data2 Data3
1 1.5 9.1342 8.1231
2 12.3 12.31 1.129
...
I would like to create a graph of month vs Data1, month vs Data2, and month vs Data2, for each city.
I know in python, I could iterate through each line, do something different if the line starts with 'Header', and then somehow process the numbers. I would like to simply do this:
for (data block starting with header) in inf:
data = read.delim()
barplot(data, main=header, ylab="Data1", xlab="Month")
# repeat for Data2, Data3
but I'm not sure how to actually iterate through the file, or if I should just split up my file by city into lots of small files, then run through a list of small files to read.
Upvotes: 0
Views: 1219
Reputation: 193667
Here is a slightly modified version of the function referred to in my comment.
read.funkyfile = function(funkyfile, expression, ...) {
temp = readLines(funkyfile)
temp.loc = grep(expression, temp)
temp.loc = c(temp.loc, length(temp)+1)
temp.nam = gsub("[[:punct:]][[:space:]]", "",
grep(expression, temp, value=TRUE))
temp.nam = gsub(expression, "", temp.nam)
temp.out = vector("list")
for (i in 1:length(temp.nam)) {
temp.out[[i]] = read.table(textConnection(
temp[seq(from = temp.loc[i]+1,
to = temp.loc[i+1]-1)]),
...)
names(temp.out)[i] = temp.nam[i]
}
temp.out
}
Assuming your file is named "File.txt", load the function and read in the data like this. You can add any of the arguments to read.table
that you need to:
temp = read.funkyfile("File.txt", "Header", header=TRUE, sep="\t")
Now, plot:
# to plot everything on one page (used for this example), uncomment the next line
# par(mfcol = c(length(temp), 1))
lapply(names(temp), function(x) barplot(as.matrix(temp[[x]][-1]),
beside=TRUE, main=x,
legend=TRUE))
# dev.off() or par(mfcol = c(1, 1)) if par was modified
Here's what your small sample data look like with par(mfcol = c(length(temp), 1))
:
Upvotes: 2
Reputation: 25736
You could use a combination of gsub
, grep
and strsplit
:
## get city name
nameSet <- function(x) {
return(gsub(pattern="Header: (.*) city data", replacement="\\1", x=x))
}
## extract monthly numbers
singleSet <- function(x) {
l <- lapply(x, function(y) {
## split single line by spaces
s <- strsplit(y, "[[:space:]]+")
## turn characters into doubles
return(as.double(s[[1]]))
})
## turn list into a matrix
m <- do.call(rbind, l)
return(m)
}
## read file
con <- file("data.txt", "r")
lines <- readLines(con)
close(con)
## determine header lines and calculate begin/end lines for each dataset
headerLines <- grep(pattern="^Header", x=lines)
beginLines <- headerLines+2
endLines <- c(headerLines[-1]-1, length(lines))
## layout plotting region
par(mfrow=c(length(beginLines), 3))
## loop through all datasets
for (i in seq(along=headerLines)) {
city <- nameSet(lines[headerLines[i]])
data <- singleSet(lines[beginLines[i]:endLines[i]])
for (j in 2:ncol(data)) {
barplot(data[,j], main=city, xlab="Month", ylab=paste("Data", j-1))
}
}
par(mfrow=c(1, 1))
Upvotes: 4