Fopa Léon Constantin
Fopa Léon Constantin

Reputation: 12363

How to do histogram with the sum of value of a column in ggplot2?

I've data that I want to plot as a histogram plot. Here are my data

convergence,rules,fact,time
1,domain,1802,8629
1,domain,1802,8913
1,rdfs,595,249
1,domain,1,9259
1,videcom,1,9071
2,domain,314151,9413
2,rdfs,8,253
....

What I want is to plot for each convergence value, the sum of time for each rules

Here is what I did so far

library(ggplot2)

w <- read.csv(file="s2.csv", head=TRUE, sep=",")

p <- ggplot(data=w, aes(group=convergence, x=factor(rules))) + geom_bar(aes(colour="red")) + + geom_text(aes(y=time + 1000, colour="red", label=time)) + facet_grid( convergence ~ . )

ggsave(file="s2.1m.png", width=15)

But the result doesn't sum times of each rules like I wanted.

Simply say

I want an histogram for each rules value representing the sum of its corresponding time value

enter image description here

What am I missing here ?

Upvotes: 3

Views: 4426

Answers (2)

BrodieG
BrodieG

Reputation: 52637

You need to use the weights aesthetic. This weights the count of each bin by the value of the bin.

ggplot(w, aes(x=rules, weights=time)) + 
  geom_bar() + facet_grid(convergence ~ .) +
  geom_text(stat="bin", aes(label=..count..), color="red", vjust=-0.1) 

enter image description here

In order for the text to work, we need to use stat="bin", which is the same as what geom_bar() is doing implicitly. We can then use the special value ..count.. which references a column in the data frame ggplot internally produces after computing the statistics.

Upvotes: 4

KFB
KFB

Reputation: 3501

Here's an attempt to the best of my understanding on the question.

# sample data
DF = read.table(text="  convergence   rules   fact time
1           1  domain   1802 8629
2           1  domain   1802 8913
3           1    rdfs    595  249
4           1  domain      1 9259
5           1 videcom      1 9071
6           2  domain 314151 9413
7           2    rdfs      8  253", header=T)

# the operation
# you need to tranform the data before plot (below is what I guess on what you what)
library(dplyr); library(ggplot2)
DF_new = DF %>% group_by(convergence, rules) %>% summarise(sum_time = sum(time))
#   convergence   rules sum_time
# 1           1  domain    26801
# 2           1    rdfs      249
# 3           1 videcom     9071
# 4           2  domain     9413
# 5           2    rdfs      253

ggplot(data=DF_new, aes(x=rules, y=sum_time)) + 
  geom_bar(stat="identity") + 
  geom_text(aes(y=sum_time + 1000, label=sum_time), colour="red") +
  facet_grid(convergence ~.)

enter image description here

Upvotes: 2

Related Questions