Vermina_H
Vermina_H

Reputation: 11

ggplot2: Area per factor barplot (geom_bar) - containing missing values (geom_text)

I'm trying to do some area calculation for a project in forestry. The data consists of 1241 obervations with two relevant variables:

MiWaReVe: 20 classes of forest types, abbreviated with number codes, in the "factor" format. area_ha: the area of a forest type in hectares, in the "num" format.

Here is my minimal dataset:

structure(list(Id = c(0L, 2L, 3L, 4L, 5L, 17L), MiWaReVe = structure(c(7L, 
7L, 14L, 17L, 17L, 17L), .Label = c("", "0", "1.1.", "2.1.", 
"2.2.1.", "2.2.2.", "2.3.1.", "2.3.2.", "3.1.1.", "3.1.2.", "3.2.1.", 
"3.2.2.", "3.2.3.", "4.1.", "4.2.", "5.1.", "5.2.", "6.", "7.", 
"8."), class = "factor"), area_ha = c(8.08759, 8.76723, 5.5033, 
1.22659, 4.31278, 8.23421), Owner = structure(c(2L, 2L, 2L, 2L, 
2L, 2L), .Label = c("Bundesforsten", "Kommunalwald", "Privatwald", 
"Staatswald"), class = "factor"), hint_cl = structure(c(3L, 3L, 
3L, 4L, 4L, 4L), .Label = c("A", "B", "C", "D", "E", "X"), class = "factor"), 
area_in_per = c(0.216871128099877, 0.23509587657276, 0.147572624140449, 
0.032891375182969, 0.115648476721321, 0.220802786950289)), .Names = c("Id", 
"MiWaReVe", "area_ha", "Owner", "hint_cl", "area_in_per"), row.names = c(NA, 
6L), class = "data.frame")
Id MiWaReVe area_ha        Owner hint_cl area_in_per
1  0   2.3.1. 8.08759 Kommunalwald       C  0.21687113
2  2   2.3.1. 8.76723 Kommunalwald       C  0.23509588
3  3     4.1. 5.50330 Kommunalwald       C  0.14757262
4  4     5.2. 1.22659 Kommunalwald       D  0.03289138
5  5     5.2. 4.31278 Kommunalwald       D  0.11564848
6 17     5.2. 8.23421 Kommunalwald       D  0.22080279

My goal is to calculate the total area of each of the forest types and build a barplot showing percentage distribution, using ggplot2. I did this using the following code:

library("ggplot2")
library("scales")


MiWaRe=read.table(file="2017_11_MiWaRe.csv", sep=";",dec="," , header=T)

str(MiWaRe)

# total area AOI
area_total=sum(MiWaRe$area_ha)


# area of each plot in % in a new column
MiWaRe=cbind(MiWaRe, "area_in_per"=MiWaRe$area_ha/area_total*100)
MiWaRe
sum(MiWaRe$`area_in_per`) # check


ggplot(data=MiWaRe, aes(x = factor(MiWaReVe), y=((area_in_per)/sum(area_in_per))))  +            
geom_bar(stat="identity")  +           
scale_y_continuous(labels = percent)

With this code I get a basic version of the barplot, I'm needing.

Now I want the exact percentage values shown over my bars. I tried to extending my code with the following:

I extended my code with:

ggplot(data=MiWaRe, aes(x = factor(MiWaReVe), y=((area_in_per)/sum(area_in_per))))  +            
geom_bar(stat="identity")  +           
scale_y_continuous(labels = percent)+
geom_text(aes(label = scales::percent((area_in_per)/sum(area_in_per)), y= ..prop.. ), stat= "count", vjust = 25)

but it labels only one bar (it's the forest type which occurs only once) and gives me the following: "Warning message: Removed 19 rows containing missing values (geom_text)." I've done some research on this warning message, but I still think the problem is deeper than too little display space.

I was also trying:

ggplot(data=MiWaRe, aes(x = factor(MiWaReVe), y=((area_in_per)/sum(area_in_per))))  +            
geom_bar(stat="identity")  +           
scale_y_continuous(labels = percent)+
geom_text(aes( label = scales::percent(..prop..),
             y= ..prop.. ), stat= "count", vjust = -1)

but it doesn't work either, of course.

I think you've surely noticed that I'm still very new to R. In fact, I've only been learning the program myself for a week, but I've been able to solve many other problems thanks to the forum posts here. I've been stuck with this problem now for some hours. So, if someone could help me further I would be very grateful and I can make myself on the long way to master R further.

Upvotes: 1

Views: 139

Answers (1)

clemens
clemens

Reputation: 6813

You can use geom_text_repel() from the ggrepel package to add those labels.

First, I create an area_pc variable to make it easier:

library(ggplot2)
library(scales)
library(ggrepel)
library(dplyr)


MiWaRe$area_pc <- MiWaRe$area_in_per / sum(MiWaRe$area_in_per)

Then I create the data to add labels:

labels <- MiWaRe %>%
  group_by(MiWaReVe) %>%
  summarise(pc_label = sum(area_pc))

Then simply add it to the plot you have created earlier:

ggplot(data=MiWaRe, aes(x = factor(MiWaReVe), y = area_pc)) +            
  geom_bar(stat="identity")  +           
  scale_y_continuous(labels = percent) +
  geom_text_repel(data = labels, aes(x = factor(MiWaReVe),
                                     y = pc_label,
                                     label = scales::percent(pc_label)))  

The result looks like this:

enter image description here

Upvotes: 0

Related Questions