Reputation: 1192
I have a table as follows,
id | membership | month | year | numberofXPurchased
----------+------------+------------+------------+-------------------
1 | 05 | 02 | 2014 | 5
1 | 06 | 03 | 2014 | 7
1 | 07 | 04 | 2014 | 3
2 | 01 | 11 | 2014 | 2
2 | 02 | 12 | 2014 | 1
2 | 03 | 01 | 2015 | 4
I created a line graph using ggplot to identify the correlation between the membership period and the number of time X was purchased
ggplot(data = df, aes (x = memberMonths, y=numberofXPurchased, group=id, color = id)) +
geom_line() +
geom_point() +
theme(legend.position = "none") +
labs(y="Membership in Months", x = "X purchased")
This produces a line graph as expected, but as I have over 100000 rows of data, the plot is not interpretable. So I'm trying to display only trend lines instead of the lines representing each id, where 1 trend line to represent the entire plot, and a set of trend lines for each 'year' (Maybe in another plot).
Adding
stat_smooth( method="lm") or
geom_smooth(method = "lm")
Only adds the trend line to the existing plot, but I want the trendline instead of the data from df
Is there an efficient way to do this, Thanks in advance
Upvotes: 1
Views: 8029
Reputation: 401
You can use geom_smooth(), with the 'lm' option gives a linear model
geom_smooth(method = "lm")
Show your code would look like..
ggplot(data = df, aes (x = memberMonths, y=numberofXPurchased,group=id, color = id)) +
geom_smooth(method = "lm") +
geom_point() +
theme(legend.position = "none") +
labs(y="Membership in Months", x = "X purchased")
As it appears geom_smooth()
needs geom_point()
to give the correct trendline I would use alpha=0
within the geom_point()
call.
ggplot(data = df, aes (x = memberMonths, y=numberofXPurchased,group=id, color = id)) +
geom_smooth(method = "lm") +
geom_point(alpha=0) +
theme(legend.position = "none") +
labs(y="Membership in Months", x = "X purchased")
Upvotes: 2