behas
behas

Reputation: 3476

merging several data frames into a single expanded frame

I have a list of data frames, where each frame contains the same kind of measurements for a single system. E.g.,

$system1                           
                file    cumSize     cumloadTime     query1
1  ../data/data1.dat    100000      158.1000        0.4333333
2  ../data/data2.dat    200000      394.9000        0.5000000
3  ../data/data3.dat    250000      561.8667        0.6666667

$system2                           
                file    cumSize     cumloadTime     query1
1  ../data/data1.dat    100000      120.1000        0.4333333
2  ../data/data2.dat    200000      244.9000        0.4500000
3  ../data/data3.dat    250000      261.8667        0.2666667

Now I would like to display several aspects of these data frames in separate plots using the matplot command. Therefore I need to transform the above input data structure into the following output structure:

$cumloadTime

cumSize     system1     system2
100000      158.1000    120.1000
200000      394.9000    244.9000
250000      561.8667    261.8667

$query1

cumSize     system1     system2
100000      0.4333333   0.4333333
200000      0.5000000   0.4500000
250000      0.6666667   0.2666667

I played around with the reshape, merge, and melt functions but haven't found the solution yet.

Thanks for any hints...

Upvotes: 9

Views: 2692

Answers (2)

kohske
kohske

Reputation: 66842

you can use melt, cast, ldply, and as Richie suggested, ggplot2.

beforehand,

library(ggplot2) # load reshape, plyr, and ggplot2

first, to use matplot,

d2 <- ldply(data_list)
cast(d2, cumSize~.id, value_var="cumloadTime")
cast(d2, cumSize~.id, value_var="query1")
matplot(d.cum, type="l")
matplot(d.que, type="l")

in my opinion, ggplot2 will work better:

d3 <- melt(d2, measure=c("cumloadTime", "query1"))
ggplot(d3, aes(cumSize, value, colour=.id)) + geom_line() + 
  facet_wrap(~variable, nrow=2, scale="free_y")

Upvotes: 7

Richie Cotton
Richie Cotton

Reputation: 121067

Use rbind to create one data frame containing everything.

data_list <- list()
data_list[["system1"]] <- read.table(tc <- textConnection("file    cumSize     cumloadTime     query1
1  ../data/data1.dat    100000      158.1000        0.4333333
2  ../data/data2.dat    200000      394.9000        0.5000000
3  ../data/data3.dat    250000      561.8667        0.6666667"), header = TRUE); close(tc)

data_list[["system2"]] <- read.table(tc <- textConnection("file    cumSize     cumloadTime     query1
1  ../data/data1.dat    100000      120.1000        0.4333333
2  ../data/data2.dat    200000      244.9000        0.4500000
3  ../data/data3.dat    250000      261.8667        0.2666667"), header = TRUE); close(tc)

for(n in names(data_list)) data_list[[n]]$system <- n

all_data <- do.call(rbind, data_list)

Forget matplot, use ggplot instead, e.g.,

p1 <- ggplot(all_data, aes(cumSize, cumloadTime, color = system)) + geom_line(); p1
p2 <- ggplot(all_data, aes(cumSize, query1, color = system)) + geom_line(); p2

Upvotes: 9

Related Questions