Reputation: 33
I am trying to plot 2 lines based on 2 variables using ggplot2 in R. Here is a piece from the complete Framingham data set that I am using:
df2 = read.table(text = " number smoker BMI sex
98 No 27.73 Men
99 No 24.35 Men
100 No 25.60 Men
101 Yes 24.33 Men
102 Yes 27.54 Men
299 No 24.62 Women
300 No 31.02 Women
301 Yes 21.68 Women
302 Yes 19.66 Women
303 Yes 26.64 Women", sep = "", header = TRUE)
I tried the following in ggplot and got a graph that I did not intend.
ggplot(df2, aes(smoker, BMI, color=sex)) + geom_line() + geom_point()
I want there to be two lines, one for Men and one for Women. I want the point in each of the smoker categories to represent the mean for that sex group.
Any idea how to do this using this data set? I found examples on stackoverflow that worked with other data sets.
Upvotes: 2
Views: 961
Reputation: 5520
The images of your charts helped a lot in understanding what you trying to do. Using ddply with summarize from the plyr package does the same calculation as tapply but returns the result in a data frame that ggplot can use directly. Given that different data is used in the two examples, the code below seems to reproduce your chart in R:
library(plyr)
df3 <- ddply(df2,.(sex, smoker), summarize, BMI_mean=mean(BMI))
ggplot(df3,aes(as.numeric(smoker), BMI_mean, color=sex)) + geom_line() +
scale_x_discrete("Current Sig Smoker Y/N", labels=levels(df3$smoker)) +
labs(y="Mean Body Mass Index (kg/(M*M)", color="SEX")
Upvotes: 1
Reputation: 33
I found a way to do it, but I am still looking for a smarter way if anyone can assist.
df3 <- with(df, tapply(BMI, list(smoker, sex), mean))
smoker <- c("No", "Yes", "No", "Yes")
sex <- c("Men", "Men", "Women", "Women")
BMI <- c(df3[1,1], df3[2,1], df3[1,2], df3[2,2])
df4 <- data.frame(smoker, sex, BMI)
ggplot(df4, aes(smoker, BMI, color=sex)) + geom_line(aes(group=sex)) + geom_point()
Upvotes: 1