Visahan
Visahan

Reputation: 1192

Display multiple trend lines in ggplot

I have a table as follows,

    id    | membership |   month    |   year     |   numberofXPurchased
----------+------------+------------+------------+-------------------
     1    |    05      |    02      |   2014     |          5
     1    |    06      |    03      |   2014     |          7     
     1    |    07      |    04      |   2014     |          3
     2    |    01      |    11      |   2014     |          2
     2    |    02      |    12      |   2014     |          1
     2    |    03      |    01      |   2015     |          4

I created a line graph using ggplot to identify the correlation between the membership period and the number of time X was purchased

ggplot(data = df, aes (x = memberMonths, y=numberofXPurchased, group=id, color = id)) +
geom_line() +
geom_point() + 
theme(legend.position = "none") +
labs(y="Membership in Months", x = "X purchased")

This produces a line graph as expected, but as I have over 100000 rows of data, the plot is not interpretable. So I'm trying to display only trend lines instead of the lines representing each id, where 1 trend line to represent the entire plot, and a set of trend lines for each 'year' (Maybe in another plot).

Adding

stat_smooth( method="lm") or
geom_smooth(method = "lm")

Only adds the trend line to the existing plot, but I want the trendline instead of the data from df

Is there an efficient way to do this, Thanks in advance

Upvotes: 1

Views: 8029

Answers (1)

enongad
enongad

Reputation: 401

You can use geom_smooth(), with the 'lm' option gives a linear model

geom_smooth(method = "lm")

Show your code would look like..

    ggplot(data = df, aes (x = memberMonths, y=numberofXPurchased,group=id, color = id)) +
    geom_smooth(method = "lm") +
    geom_point() + 
    theme(legend.position = "none") +
    labs(y="Membership in Months", x = "X purchased")

As it appears geom_smooth() needs geom_point() to give the correct trendline I would use alpha=0 within the geom_point() call.

    ggplot(data = df, aes (x = memberMonths, y=numberofXPurchased,group=id, color = id)) +
    geom_smooth(method = "lm") +
    geom_point(alpha=0) + 
    theme(legend.position = "none") +
    labs(y="Membership in Months", x = "X purchased")

Upvotes: 2

Related Questions