heatherisis
heatherisis

Reputation: 33

Plotting lines with multiple variables in ggplot

I am trying to plot 2 lines based on 2 variables using ggplot2 in R. Here is a piece from the complete Framingham data set that I am using:

df2 = read.table(text = " number smoker   BMI   sex
98      No 27.73   Men
99      No 24.35   Men
100     No 25.60   Men
101    Yes 24.33   Men
102    Yes 27.54   Men
299     No 24.62 Women
300     No 31.02 Women
301    Yes 21.68 Women
302    Yes 19.66 Women
303    Yes 26.64 Women", sep = "", header = TRUE)

I tried the following in ggplot and got a graph that I did not intend.

ggplot(df2, aes(smoker, BMI, color=sex)) + geom_line() + geom_point()

I want there to be two lines, one for Men and one for Women. I want the point in each of the smoker categories to represent the mean for that sex group.

Any idea how to do this using this data set? I found examples on stackoverflow that worked with other data sets.

Upvotes: 2

Views: 961

Answers (2)

WaltS
WaltS

Reputation: 5520

The images of your charts helped a lot in understanding what you trying to do. Using ddply with summarize from the plyr package does the same calculation as tapply but returns the result in a data frame that ggplot can use directly. Given that different data is used in the two examples, the code below seems to reproduce your chart in R:

 library(plyr)
 df3 <- ddply(df2,.(sex, smoker), summarize, BMI_mean=mean(BMI))
 ggplot(df3,aes(as.numeric(smoker), BMI_mean, color=sex)) + geom_line() + 
       scale_x_discrete("Current Sig Smoker Y/N", labels=levels(df3$smoker)) +
       labs(y="Mean Body Mass Index (kg/(M*M)", color="SEX")

enter image description here

Upvotes: 1

heatherisis
heatherisis

Reputation: 33

I found a way to do it, but I am still looking for a smarter way if anyone can assist.

df3 <- with(df, tapply(BMI, list(smoker, sex), mean))
smoker <- c("No", "Yes", "No", "Yes")
sex <- c("Men", "Men", "Women", "Women")
BMI <- c(df3[1,1], df3[2,1], df3[1,2], df3[2,2])
df4 <- data.frame(smoker, sex, BMI)
ggplot(df4, aes(smoker, BMI, color=sex)) + geom_line(aes(group=sex)) + geom_point()

Correct R plot

Upvotes: 1

Related Questions