Reputation: 1225
Provided this dataframe obtained from a questionnaire made to people from different neighborhoods, I'd like to create a barplot showing the degree of identification per neighborhood.
In fact I managed to do it with the following code:
library(ggplot2)
df = read.csv("http://pastebin.com/raw.php?i=77QPBc5T")
ggplot(df,
aes(x = factor(Identificación.con.el.barrio),
fill = Nombre.barrio)
) +
geom_histogram(position="dodge") +
ggtitle("¿Te identificas con tu barrio?") +
labs(x="Grado de identificación con el barrio", fill="Barrios")
Resulting in the following plot:
However, since each neighborhood has a different number of population, the sample per neighborhood is also really different (eg: Arcosur has only 24 respondants whereas Arrabal has 69) and thus, the results may be misleading (see below)
library(dplyr)
df = tbl_df(df)
df %>%
group_by(Nombre.barrio) %>%
summarise(Total = n())
Source: local data frame [10 x 2]
Nombre.barrio Total
1 Almozara 68
2 Arcosur 24
3 Arrabal 69
4 Bombarda 20
5 Delicias 68
6 Jesús 69
7 La Bozada 32
8 Las fuentes 64
9 Oliver 68
10 Picarral 68
For this reason I'd like to have relative values on y axis, displaying the % of respondants per neighborhood that answered each one of the possible answers. Unfortunately I don't have any idea on how to achieve this, since I am pretty new to R.
Upvotes: 2
Views: 2309
Reputation: 19867
library(ggplot2)
library(dplyr)
df = read.csv("http://pastebin.com/raw.php?i=77QPBc5T")
df = tbl_df(df)
d <- df %>%
group_by(Nombre.barrio,Identificación.con.el.barrio) %>%
summarise(Total = n()) %>%
mutate(freq=Total/sum(Total))
ggplot(d,
aes(x = factor(Identificación.con.el.barrio),
y=freq,
fill = Nombre.barrio)
) +
geom_bar(position="dodge",stat="identity") +
ggtitle("¿Te identificas con tu barrio?") +
labs(x="Grado de identificación con el barrio", fill="Barrios")
Upvotes: 1