Reputation: 857
Plotting a bunch of line plots on top of one another, but I only want to color 10 specifically after they are all plotted amongst themselves (to visualize how my 'targets' traveled over time while being able to view the masses of other behind them. So an example of this would be like 100 line graphs over time, but I want to color 5 or 10 of them specifically to discuss about with respect to the trend of the 90 other grayscale ones.
The following post has a pretty good image that I want to replicate, but with slightly more meat on the bones, , Except I want MANY lines behind those 3 all grayscale, but those 3 are my highlighted cities I want to see in the foreground, per say.
My original data was in the following form:
# The unique identifier is a City-State combo,
# there can be the same cities in 1 state or many.
# Each state's year ranges from 1:35, but may not have
# all of the values available to us, but some are complete.
r1 <- c("city1" , "state1" , "year" , "population" , rnorm(11) , "2")
r2 <- c("city1" , "state2" , "year" , "population" , rnorm(11) , "3")
r3 <- c("city2" , "state1" , "year" , "population" , rnorm(11) , "2")
r4 <- c("city3" , "state2" , "year" , "population" , rnorm(11) , "1")
r5 <- c("city3" , "state2" , "year" , "population" , rnorm(11) , "7")
df <- data.frame(matrix(nrow = 5, ncol = 16))
df[1,] <- r1
df[2,] <- r2
df[3,] <- r3
df[4,] <- r4
df[5,] <- r5
names(df) <- c("City", "State", "Year", "Population", 1:11, "Cluster")
head(df)
#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~#
# City | State | Year | Population | ... 11 Variables ... | Cluster #
# ----------------------------------------------------------------------#
# Each row is a city instance with these features ... #
#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~#
But I thought it might be better to view the data differently, so I also have it in the following format. I am not sure which is better for this problem.
cols <- c(0:35)
rows <- c("unique_city1", "unique_city2","unique_city3","unique_city4","unique_city5")
r1 <- rnorm(35)
r2 <- rnorm(35)
r3 <- rnorm(35)
r4 <- rnorm(35)
r5 <- rnorm(35)
df <- data.frame(matrix(nrow = 5, ncol = 35))
df[1,] <- r1
df[2,] <- r2
df[3,] <- r3
df[4,] <- r4
df[5,] <- r5
names(df) <- cols
row.names(df) <- rows
head(df)
#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~#
# Year1 Year2 .......... Year 35 #
# UniqueCityState1 VAL NA .......... VAL #
# UniqueCityState2 VAL VAL .......... NA #
# . #
# . #
# . #
#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~#
I have tried using melt
to get the data into a format that is possible for ggplot
to accept and plot each of these cities over time, but nothing has seemed to work. Also, I have tried creating my own functions to loop through each of my unique city-state combinations to stack ggplots
which had some fair amount of research available on the topic, but nothing yet still. I am not sure how I could find each of these unique citystate pairs and plot them over time taking their cluster value or any numeric value for that matter. Or maybe what I am seeking is not possible, I am not sure.
Thoughts?
> head(df)
city state year population stat1 stat2 stat3 stat4 stat5
1 BESSEMER 1 1 31509 0.3808436 0 0.63473928 2.8563268 9.5528262
2 BIRMINGHAM 1 1 282081 0.3119671 0 0.97489728 6.0266377 9.1321287
3 MOUNTAIN BROOK 1 1 18221 0.0000000 0 0.05488173 0.2744086 0.4390538
4 FAIRFIELD 1 1 12978 0.1541069 0 0.46232085 3.0050855 9.8628448
5 GARDENDALE 1 1 7828 0.2554931 0 0.00000000 0.7664793 1.2774655
6 LEEDS 1 1 7865 0.2542912 0 0.12714558 1.5257470 13.3502861
stat6 stat6 stat7 stat8 stat9 cluster
1 26.976419 53.54026 5.712654 0 0.2856327 9
2 35.670605 65.49183 11.982374 0 0.4963113 9
3 6.311399 21.40387 1.426925 0 0.1097635 3
4 21.266759 68.11527 11.480968 0 1.0787487 9
5 6.770567 23.24987 3.960143 0 0.0000000 3
6 24.157661 39.79657 4.450095 0 1.5257470 15
agg
1 99.93970
2 130.08675
3 30.02031
4 115.42611
5 36.28002
6 85.18754
And ultimately I need it in the form of unique cities as row.names
, 1:35 as col.names
and the value inside each cell to be agg
if that year was present or NA
if it wasn't. Again I am sure this is possible, I just can't attain a good solution to it and my current way is unstable.
Upvotes: 2
Views: 1858
Reputation: 39174
If I understand your question correctly, you want to plot all the lines in one color, and then plot a few lines with several different colors. You may use ggplot2
, calling geom_line
twice on two data frames. The first time plot all city data without mapping lines to color. The second time plot just the subset of your target city and mapping lines to color. You will need to re-organize your original data frame and subset the data frame for the target city. In the following code I used tidyr
and dplyr
to process the data frame.
### Set.seed to improve reproducibility
set.seed(123)
### Load package
library(tidyr)
library(dplyr)
library(ggplot2)
### Prepare example data frame
r1 <- rnorm(35)
r2 <- rnorm(35)
r3 <- rnorm(35)
r4 <- rnorm(35)
r5 <- rnorm(35)
df <- data.frame(matrix(nrow = 5, ncol = 35))
df[1,] <- r1
df[2,] <- r2
df[3,] <- r3
df[4,] <- r4
df[5,] <- r5
names(df) <- 1:35
df <- df %>% mutate(City = 1:5)
### Reorganize the data for plotting
df2 <- df %>%
gather(Year, Value, -City) %>%
mutate(Year = as.numeric(Year))
The gather
function takes df
as the first argument. It will create the key
column called Year
, which will store year number. The year number are the column names of each column in the df
data frame except the City
column. gather
function will also create a column called Value
, which will store all the numeric values from each column in in the df
data frame except the City
column. Finally, City
column will not involve in this process, so use -City
to tell the gather
function "do not transform the data from the City
column".
### Subset df2, select the city of interest
df3 <- df2 %>%
# In this example, assuming that City 2 and City 3 are of interest
filter(City %in% c(2, 3))
### Plot the data
ggplot(data = df2, aes(x = Year, y = Value, group = factor(City))) +
# Plot all city data here in gray lines
geom_line(size = 1, color = "gray") +
# Plot target city data with colors
geom_line(data = df3,
aes(x = Year, y = Value, group = City, color = factor(City)),
size = 2)
The resulting plot can be seen here: https://dl.dropboxusercontent.com/u/23652366/example_plot.png
Upvotes: 3