Reputation: 81
I have a large dataframe (CO2_df) with many years for many countries, and tried to plot a graph with ggplot2
. This graph will have 6 curves + an aggregate curve. However, my graph looks pretty "funny" and I have no idea why.
The data looks like this (excerpt):
x y x1 x2 x4 x6
1553 1993 0.00000 CO2 Austria 6 6 - Other Sector
1554 2006 0.00000 CO2 Austria 6 6 - Other Sector
1555 2015 0.00000 CO2 Austria 6 6 - Other Sector
2243 1998 12.07760 CO2 Austria 5 5 - Waste management
2400 1992 11.12720 CO2 Austria 5 5 - Waste management
2401 1995 11.11040 CO2 Austria 5 5 - Waste management
2402 2006 10.26000 CO2 Austria 5 5 - Waste management
2489 1998 0.00000 CO2 Austria 6 6 - Other Sector
I have used this code:
ggplot(data=CO2_df, aes(x=x, y=y, group=x6, colour=x6)) +
geom_line() +
geom_point() +
ggtitle("Austria") +
xlab("Year") +
ylab("C02 Emissions") +
labs(colour = "Sectors")
scale_color_brewer(palette="Dark2")
CO2_df %>%
group_by(x) %>%
mutate(sum.y = sum(y)) %>%
ggplot(aes(x=x, y=y, group=x6, colour=x6)) +
geom_line() +
geom_point() +
ggtitle("Austria") +
xlab("Year") +
ylab("C02 Emissions") +
labs(colour = "Sectors")+
scale_color_brewer(palette="Dark2")+
geom_line(aes(y = sum.y), color = "black")
My questions
1) Why does it look like this and how can I solve it? 2) I have no idea why the value on the y axis are close to zero. They are not... 3) How can I add an entry to the legend for the aggregate line?
Thank you for any sort of help!
Nordsee
Upvotes: 0
Views: 1921
Reputation: 9505
What about something like this:
CO2_df %>% # data
group_by(x,x6) %>% # group by
summarise(y = sum(y)) %>% # add the sum per group
ggplot(aes(x=x, y=y)) + # plot
geom_line(aes(group=x6, fill=x6, color=x6))+
# here you can put a summary line, like sum, or mean, and so on
stat_summary(fun.y = sum, na.rm = TRUE, color = 'black', geom ='line') +
geom_point() +
ggtitle("Austria") +
xlab("Year") +
ylab("C02 Emissions") +
labs(colour = "Sectors")+
scale_color_brewer(palette="Dark2"))
With modified data, to see the right behaviour, I've put same years and very different values to understand:
CO2_df <- read.table(text ="
x y x1 x2 x4 x6
1553 1993 20 CO2 'Austria' 6 '6 - Other Sector'
1554 1994 23 CO2 'Austria' 6 '6 - Other Sector'
1555 1995 43 CO2 'Austria' 6 '6 - Other Sector'
2243 1993 12.07760 CO2 'Austria' 5 '5 - Waste management'
2400 1994 11.12720 CO2 'Austria' 5 '5 - Waste management'
2401 1995 11.11040 CO2 'Austria' 5 '5 - Waste management'
2402 1996 10.26000 CO2 'Austria' 5 '5 - Waste management'
2489 1996 50 CO2 'Austria' 6 '6 - Other Sector'", header = T)
Upvotes: 1