r - ggplot multiple line graphs for each unique instance over time

Question

The Problem

Plotting a bunch of line plots on top of one another, but I only want to color 10 specifically after they are all plotted amongst themselves (to visualize how my 'targets' traveled over time while being able to view the masses of other behind them. So an example of this would be like 100 line graphs over time, but I want to color 5 or 10 of them specifically to discuss about with respect to the trend of the 90 other grayscale ones.

The following post has a pretty good image that I want to replicate, but with slightly more meat on the bones, , Except I want MANY lines behind those 3 all grayscale, but those 3 are my highlighted cities I want to see in the foreground, per say.

My original data was in the following form:

# The unique identifier is a City-State combo, 
# there can be the same cities in 1 state or many. 
# Each state's year ranges from 1:35, but may not have
# all of the values available to us, but some are complete.

r1 <- c("city1" , "state1" , "year" , "population" , rnorm(11) , "2")
r2 <- c("city1" , "state2" , "year" , "population" , rnorm(11) , "3")
r3 <- c("city2" , "state1" , "year" , "population" , rnorm(11) , "2")
r4 <- c("city3" , "state2" , "year" , "population" , rnorm(11) , "1")
r5 <- c("city3" , "state2" , "year" , "population" , rnorm(11) , "7")

df <- data.frame(matrix(nrow = 5, ncol = 16))
df[1,] <- r1
df[2,] <- r2
df[3,] <- r3
df[4,] <- r4
df[5,] <- r5

names(df) <- c("City", "State", "Year", "Population", 1:11, "Cluster")

head(df)


#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~#
# City | State | Year | Population  | ... 11 Variables ... | Cluster    #
# ----------------------------------------------------------------------#
# Each row is a city instance with these features ...                   #
#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~#

But I thought it might be better to view the data differently, so I also have it in the following format. I am not sure which is better for this problem.

cols <- c(0:35)
rows <- c("unique_city1", "unique_city2","unique_city3","unique_city4","unique_city5")
r1 <- rnorm(35)
r2 <- rnorm(35)
r3 <- rnorm(35)
r4 <- rnorm(35)
r5 <- rnorm(35)

df <- data.frame(matrix(nrow = 5, ncol = 35))
df[1,] <- r1
df[2,] <- r2
df[3,] <- r3
df[4,] <- r4
df[5,] <- r5

names(df) <- cols
row.names(df) <- rows

head(df)


#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~#
#                       Year1 Year2 .......... Year 35  #
# UniqueCityState1       VAL    NA  ..........  VAL     #
# UniqueCityState2       VAL    VAL ..........  NA      #
#         .                                             #
#         .                                             #
#         .                                             #
#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~#

Prior Attempts

I have tried using melt to get the data into a format that is possible for ggplot to accept and plot each of these cities over time, but nothing has seemed to work. Also, I have tried creating my own functions to loop through each of my unique city-state combinations to stack ggplots which had some fair amount of research available on the topic, but nothing yet still. I am not sure how I could find each of these unique citystate pairs and plot them over time taking their cluster value or any numeric value for that matter. Or maybe what I am seeking is not possible, I am not sure.

Thoughts?

EDIT: More information about data structure

> head(df)
        city state year population    stat1 stat2 stat3 stat4 stat5
1       BESSEMER     1    1      31509 0.3808436            0 0.63473928   2.8563268    9.5528262
2     BIRMINGHAM     1    1     282081 0.3119671            0 0.97489728   6.0266377    9.1321287
3 MOUNTAIN BROOK     1    1      18221 0.0000000            0 0.05488173   0.2744086    0.4390538
4      FAIRFIELD     1    1      12978 0.1541069            0 0.46232085   3.0050855    9.8628448
5     GARDENDALE     1    1       7828 0.2554931            0 0.00000000   0.7664793    1.2774655
6          LEEDS     1    1       7865 0.2542912            0 0.12714558   1.5257470   13.3502861
  stat6 stat6 stat7 stat8 stat9 cluster
1     26.976419     53.54026  5.712654                    0               0.2856327       9
2     35.670605     65.49183 11.982374                    0               0.4963113       9
3      6.311399     21.40387  1.426925                    0               0.1097635       3
4     21.266759     68.11527 11.480968                    0               1.0787487       9
5      6.770567     23.24987  3.960143                    0               0.0000000       3
6     24.157661     39.79657  4.450095                    0               1.5257470      15
    agg
1  99.93970
2 130.08675
3  30.02031
4 115.42611
5  36.28002
6  85.18754

And ultimately I need it in the form of unique cities as row.names, 1:35 as col.names and the value inside each cell to be agg if that year was present or NA if it wasn't. Again I am sure this is possible, I just can't attain a good solution to it and my current way is unstable.

r - ggplot multiple line graphs for each unique instance over time

The Problem

Prior Attempts

EDIT: More information about data structure

Answers (1)

Related Questions