Chrisvdberge
Chrisvdberge

Reputation: 1956

R: handling and plotting grouped data

This is a follow up question on this one: R: plot multiple lines in one graph

In there I used part of my data to graph 1 graph with multiple lines. Now I want to graph multiple graphs in one grid, since I have grouped data. Right now I'm doing this with creating dataframes for each group of data and then creating a graph for each dataframe and combine those using gridd.arrange() However, I'm wondering if I could handle the grouped data as 1 dataset instead of creating all those separate tables?

The data I have is structured like this:

          Category1    Category2    Category3
Company   2011   2013  2011   2013  2011   2013
Company1  300    350   290    300   295    290
Company2  320    430   305    301   300    400
Company3  310    420   400    305   400    410

So is there any way to proces this at once and plot the 3 graphs (for each Category) with lines for each Company for the Year (2011 and 2013)?

Upvotes: 5

Views: 11492

Answers (2)

Beasterfield
Beasterfield

Reputation: 7113

You should definitely learn how to structure your data and how to make a reproducable example. It's really hard to deal with data in such an unstructured format. Not only for you, but also for us.

mdf <- read.table( text="Company   2011   2013  2011   2013  2011   2013
Company1  300    350   290    300   295    290
Company2  320    430   305    301   300    400
Company3  310    420   400    305   400    410", header = TRUE, check.names=FALSE )

library("reshape2")
cat1 <- melt(mdf[c(1,2,3)], id.vars="Company", value.name="value", variable.name="Year")
cat1$Category <- "Category1"
cat2 <- melt(mdf[c(1,4,5)], id.vars="Company", value.name="value", variable.name="Year")
cat2$Category <- "Category2"
cat3 <- melt(mdf[c(1,6,7)], id.vars="Company", value.name="value", variable.name="Year")
cat3$Category <- "Category3"
mdf <- rbind(cat1, cat2, cat3)

head(mdf)
   Company Year value  Category
1 Company1 2011   300 Category1
2 Company2 2011   320 Category1
3 Company3 2011   310 Category1
4 Company1 2013   350 Category1
5 Company2 2013   430 Category1
6 Company3 2013   420 Category1

This can be automated of course, if the number of categories is very large:

library( "plyr" )
mdf <- adply( c(1:3), 1, function( cat ){
  tmp <- melt(mdf[ c(1, cat*2, cat*2+1) ], id.vars="Company", value.name="value", variable.name="Year")
  tmp$Category <- paste0("Category", cat)
  return(tmp)
} )

But if you can avoid pushing all this data forth and back from the beginning, you should do so.

Using facets

ggplot2 has a builtin support for faceted plots displaying data of the same type, if they can be subset by one (or multiple) variables. See ? facet_wrap or ? facet_grid.

ggplot(data=mdf, aes(x=Year, y=value, group = Company, colour = Company)) +
    geom_line() +
    geom_point( size=4, shape=21, fill="white") +
    facet_wrap( "Category" )

enter image description here

Getting individual plots

Alternatively you can subset your data.frame by the according variable and store the individual plots in an list:

librayr("plyr")
ll <- dlply( mdf, "Category", function(x){
        ggplot(data=x, aes(x=Year, y=value, group = Company, colour = Company)) +
          geom_line() +
          geom_point( size=4, shape=21, fill="white")
})
ll[["Category1"]]

Upvotes: 8

Adam Hyland
Adam Hyland

Reputation: 1057

At least for ggplot2 you'll want to use the reshape2 package in order to convert your data to a slightly different format.

Let's imagine that you have a data.frame like this:

test <- structure(list(Company = structure(1:3, .Label = c("Company1", 
"Company2", "Company3"), class = "factor"), X2011.1 = c(300L, 
320L, 310L), X2013.1 = c(350L, 430L, 420L), X2011.2 = c(290, 
305, 400), X2013.2 = c(300, 301, 305), X2011.3 = c(295, 300, 
400), X2013.3 = c(290L, 400L, 410L)), .Names = c("Company", "X2011.1", 
"X2013.1", "X2011.2", "X2013.2", "X2011.3", "X2013.3"), class = "data.frame", row.names = c(NA, 
-3L))

Ignore the ugliness for now, that looks like:

  Company  X2011.1 X2013.1 X2011.2 X2013.2 X2011.3 X2013.3
  Company1     300     350     290     300     295     290
  Company2     320     430     305     301     300     400
  Company3     310     420     400     305     400     410

If we use the melt() function we can make it look like this:

melt(test) -> test.melt

test.melt

Using Company as id variables
    Company variable value
1  Company1  X2011.1   300
2  Company2  X2011.1   320
3  Company3  X2011.1   310
4  Company1  X2013.1   350
5  Company2  X2013.1   430
6  Company3  X2013.1   420
7  Company1  X2011.2   290
8  Company2  X2011.2   305

Then use the company or variable as a grouping factor for ggplot2. Obviously you'll want to name these more sensibly. :)

e.g. you could do

ggplot(melt(test)) + geom_bar(aes(x = Company, y = value, fill = variable), stat = "identity", position = "dodge")

Or something.

Upvotes: 0

Related Questions