tl124
tl124

Reputation: 21

Create line graph with multiple lines in R

I want to plot census data to compare data for each race over multiple years.

My data frame has years 1950-2010 (every 10 years) as the rows and race as the columns. The data at the cross section is the percentage of that race in a given year.

I want my line graph to plot the years on the x axis and race on the y axis. So with my 5 "race" variables, there would be 5 lines of different colors all plotted on the same graph.

I have tried to watch videos and scoured all over here but nothing I find seems to work the way I want it to.

Edit: I refactored to the code and built my own dataframe instead of having it return a matrix.

However, I want the right side to say "Race" and then have my 5 lines. I am working on getting one line to show up at all before doing the other 4.

new dataframe returned plot

Edit: I have figured out thus far in my code - Allston <- ggplot(data = dataAllston, aes(Year, White.pct, group = 1)) + geom_point(aes(color = "orange")) + geom_line(aes(color = "orange"))

I want to scale the Y axis and from 0-1 in 0.2 increments and have the Y be "Race" instead of the individual labels. And more than just relabeling -- I want the graph to be representative of the actual increases/decreases as opposed to a straight line diagonally down as it is now.

I think it will take me longer to learn how to make the reproducible code than it will to make tweaks.

new returned plot

Edit:

dput(dataAllston)

returns

structure(list(Year = c(1950, 1960, 1970, 1980, 1990, 2000, 2010
), White.pct = structure(7:1, .Label = c("57.0", "59.0", "63.0", 
"78.0", "90.8", "98.0", "98.3"), class = "factor"), BlackOrAA.pct = 
structure(c(2L, 
1L, 3L, 4L, 5L, 4L, 4L), .Label = c("1.20", "1.30", "2.60", "5.00", 
"9.00"), class = "factor"), Hispanic.pct = structure(c(1L, 1L, 
3L, 4L, 2L, 2L, 2L), .Label = c("0.00", "13.0", "3.10", "6.00"
), class = "factor"), AsianOrPI.pct = structure(c(1L, 1L, 5L, 
6L, 2L, 3L, 4L), .Label = c("0.00", "14.0", "18.0", "20.0", "3.20", 
"9.00"), class = "factor"), Other.pct = structure(c(2L, 1L, 3L, 
4L, 5L, 4L, 4L), .Label = c("1.20", "1.30", "2.60", "5.00", "9.00"
), class = "factor")), class = "data.frame", row.names = c(NA, 

-7L))

result from dput(data)

Upvotes: 1

Views: 2835

Answers (1)

dc37
dc37

Reputation: 16178

You need first to reshape your dataset into a longer format by using for example pivot_longer function from tidyr. At the end, your data should look like this.

As your data are in factor format (except Year column), the first line will convert all of them into a numerical format much appropriate for plotting.

library(dplyr)
library(tidyr)

Reshaped_DF <- df %>% mutate_at(vars(ends_with(".pct")), ~as.numeric(as.character(.))) %>%
   pivot_longer(-Year, names_to = "Races", values_to = "values")

# A tibble: 35 x 3
    Year Races         values
   <dbl> <chr>          <dbl>
 1  1950 White.pct       98.3
 2  1950 BlackOrAA.pct    1.3
 3  1950 Hispanic.pct     0  
 4  1950 AsianOrPI.pct    0  
 5  1950 Other.pct        1.3
 6  1960 White.pct       98  
 7  1960 BlackOrAA.pct    1.2
 8  1960 Hispanic.pct     0  
 9  1960 AsianOrPI.pct    0  
10  1960 Other.pct        1.2
# … with 25 more rows

Then, you can plot it in ggplot2 by doing:

library(ggplot2)

ggplot(Reshaped_DF,aes(x = Year, y = values, color = Races, group = Races))+
  geom_line()+
  geom_point()+
  ylab("Percentage")

enter image description here Does it answer your question ?

If not, please consider providing a reproducible example of your dataset that people can easily copy/paste. See this guide: How to make a great R reproducible example

Upvotes: 2

Related Questions