Reputation: 21
I want to plot census data to compare data for each race over multiple years.
My data frame has years 1950-2010 (every 10 years) as the rows and race as the columns. The data at the cross section is the percentage of that race in a given year.
I want my line graph to plot the years on the x axis and race on the y axis. So with my 5 "race" variables, there would be 5 lines of different colors all plotted on the same graph.
I have tried to watch videos and scoured all over here but nothing I find seems to work the way I want it to.
Edit: I refactored to the code and built my own dataframe instead of having it return a matrix.
However, I want the right side to say "Race" and then have my 5 lines. I am working on getting one line to show up at all before doing the other 4.
Edit:
I have figured out thus far in my code - Allston <- ggplot(data = dataAllston, aes(Year, White.pct, group = 1)) + geom_point(aes(color = "orange")) + geom_line(aes(color = "orange"))
I want to scale the Y axis and from 0-1 in 0.2 increments and have the Y be "Race" instead of the individual labels. And more than just relabeling -- I want the graph to be representative of the actual increases/decreases as opposed to a straight line diagonally down as it is now.
I think it will take me longer to learn how to make the reproducible code than it will to make tweaks.
Edit:
dput(dataAllston)
returns
structure(list(Year = c(1950, 1960, 1970, 1980, 1990, 2000, 2010
), White.pct = structure(7:1, .Label = c("57.0", "59.0", "63.0",
"78.0", "90.8", "98.0", "98.3"), class = "factor"), BlackOrAA.pct =
structure(c(2L,
1L, 3L, 4L, 5L, 4L, 4L), .Label = c("1.20", "1.30", "2.60", "5.00",
"9.00"), class = "factor"), Hispanic.pct = structure(c(1L, 1L,
3L, 4L, 2L, 2L, 2L), .Label = c("0.00", "13.0", "3.10", "6.00"
), class = "factor"), AsianOrPI.pct = structure(c(1L, 1L, 5L,
6L, 2L, 3L, 4L), .Label = c("0.00", "14.0", "18.0", "20.0", "3.20",
"9.00"), class = "factor"), Other.pct = structure(c(2L, 1L, 3L,
4L, 5L, 4L, 4L), .Label = c("1.20", "1.30", "2.60", "5.00", "9.00"
), class = "factor")), class = "data.frame", row.names = c(NA,
-7L))
Upvotes: 1
Views: 2835
Reputation: 16178
You need first to reshape your dataset into a longer format by using for example pivot_longer
function from tidyr
. At the end, your data should look like this.
As your data are in factor format (except Year column), the first line will convert all of them into a numerical format much appropriate for plotting.
library(dplyr)
library(tidyr)
Reshaped_DF <- df %>% mutate_at(vars(ends_with(".pct")), ~as.numeric(as.character(.))) %>%
pivot_longer(-Year, names_to = "Races", values_to = "values")
# A tibble: 35 x 3
Year Races values
<dbl> <chr> <dbl>
1 1950 White.pct 98.3
2 1950 BlackOrAA.pct 1.3
3 1950 Hispanic.pct 0
4 1950 AsianOrPI.pct 0
5 1950 Other.pct 1.3
6 1960 White.pct 98
7 1960 BlackOrAA.pct 1.2
8 1960 Hispanic.pct 0
9 1960 AsianOrPI.pct 0
10 1960 Other.pct 1.2
# … with 25 more rows
Then, you can plot it in ggplot2
by doing:
library(ggplot2)
ggplot(Reshaped_DF,aes(x = Year, y = values, color = Races, group = Races))+
geom_line()+
geom_point()+
ylab("Percentage")
Does it answer your question ?
If not, please consider providing a reproducible example of your dataset that people can easily copy/paste. See this guide: How to make a great R reproducible example
Upvotes: 2