First Last
First Last

Reputation: 649

Making multiple plots in R from one textfile

I'm new to R and trying to generate a lot of graphs from one file, with headers between different data sets. I have a tab-delimited plaintext file, formatted like this:

Header: Boston city data
Month    Data1    Data2    Data3
1        1.5      9.1342   8.1231
2        12.3     12.31    1.129
3        (etc...)  

Header: Chicago city data
Month    Data1    Data2    Data3
1        1.5      9.1342   8.1231
2        12.3     12.31    1.129
...

I would like to create a graph of month vs Data1, month vs Data2, and month vs Data2, for each city.

I know in python, I could iterate through each line, do something different if the line starts with 'Header', and then somehow process the numbers. I would like to simply do this:

for (data block starting with header) in inf:
    data = read.delim()
    barplot(data, main=header, ylab="Data1", xlab="Month")
    # repeat for Data2, Data3

but I'm not sure how to actually iterate through the file, or if I should just split up my file by city into lots of small files, then run through a list of small files to read.

Upvotes: 0

Views: 1219

Answers (2)

A5C1D2H2I1M1N2O1R2T1
A5C1D2H2I1M1N2O1R2T1

Reputation: 193667

Here is a slightly modified version of the function referred to in my comment.

read.funkyfile = function(funkyfile, expression, ...) {
  temp = readLines(funkyfile)
  temp.loc = grep(expression, temp)
  temp.loc = c(temp.loc, length(temp)+1)
  temp.nam = gsub("[[:punct:]][[:space:]]", "", 
                  grep(expression, temp, value=TRUE))
  temp.nam = gsub(expression, "", temp.nam)
  temp.out = vector("list")

  for (i in 1:length(temp.nam)) {
    temp.out[[i]] = read.table(textConnection(
      temp[seq(from = temp.loc[i]+1,
               to = temp.loc[i+1]-1)]),
                             ...)
    names(temp.out)[i] = temp.nam[i]
  }
  temp.out
}

Assuming your file is named "File.txt", load the function and read in the data like this. You can add any of the arguments to read.table that you need to:

temp = read.funkyfile("File.txt", "Header", header=TRUE, sep="\t")

Now, plot:

# to plot everything on one page (used for this example), uncomment the next line
# par(mfcol = c(length(temp), 1)) 
lapply(names(temp), function(x) barplot(as.matrix(temp[[x]][-1]), 
                                        beside=TRUE, main=x, 
                                        legend=TRUE))
# dev.off() or par(mfcol = c(1, 1)) if par was modified

Here's what your small sample data look like with par(mfcol = c(length(temp), 1)):

enter image description here

Upvotes: 2

sgibb
sgibb

Reputation: 25736

You could use a combination of gsub, grep and strsplit:

## get city name
nameSet <- function(x) {
    return(gsub(pattern="Header: (.*) city data", replacement="\\1", x=x))
}

## extract monthly numbers
singleSet <- function(x) {
    l <- lapply(x, function(y) {
        ## split single line by spaces
        s <- strsplit(y, "[[:space:]]+")
        ## turn characters into doubles
        return(as.double(s[[1]]))
    })
    ## turn list into a matrix
    m <- do.call(rbind, l)
    return(m) 
}

## read file
con <- file("data.txt", "r")
lines <- readLines(con)
close(con)

## determine header lines and calculate begin/end lines for each dataset
headerLines <- grep(pattern="^Header", x=lines)
beginLines <- headerLines+2
endLines <- c(headerLines[-1]-1, length(lines))

## layout plotting region
par(mfrow=c(length(beginLines), 3))

## loop through all datasets
for (i in seq(along=headerLines)) {
    city <- nameSet(lines[headerLines[i]])
    data <- singleSet(lines[beginLines[i]:endLines[i]])

    for (j in 2:ncol(data)) {
        barplot(data[,j], main=city, xlab="Month", ylab=paste("Data", j-1))
    }
}
par(mfrow=c(1, 1))

barplots

Upvotes: 4

Related Questions