Reputation: 47
After searching over an hour (this forum, Youtube, class notes, google) I've found no help for my question. I'm a complete newb who knows nothing about R or stats.
I'm attempting to create a linear mixed effects model in R. I'm measuring leaf width across three different locations (Jacksonville FL, Augusta GA, & Atlanta GA), and within those three locations there is a high-nitrogen and low-nitrogen plot. I have 150 leaf measurements from 50 trees.
My limited understanding tells me that the leaf width is the continuous response variable, and city and plot are the discrete explanatory variables. The random effect would be the individual trees, since the leaf width within a single tree is non-independent.
I've used "nlme" to make a model:
leaf.width.model <- lme(width ~ city*plot, (1|tree.id), data=leaf)
I then ran an ANOVA test, and it suggested there's something going on with city and the interaction between city and plot. This is where I'm stuck. I want to make a plot that has lines for all three cities, but I haven't a clue how to do that. When I try to use the plot function, I just get a boxplot.
I've literally tried for hours and am more lost and confused than before.
1) How can I make this graph?
2) What other tests should I do to analyze and/or visualize this data?
I am forever grateful for any help at all. I really want to learn R and stats very badly, but I'm getting discouraged.
Thank you,
Rich
P.S Here is the output of the dput
function:
> dput(tree) structure(list(tree.id = structure(c(24L, 24L, 32L, 25L, 25L, 24L, 24L, 32L, 25L, 25L, 43L, 45L, 45L, 43L, 23L, 23L, 45L, 45L, 23L, 23L, 41L, 41L, 38L, 11L, 11L, 38L, 41L, 41L, 11L, 11L, 14L, 14L, 29L, 13L, 13L, 14L, 14L, 29L, 13L, 13L, 4L, 4L, 1L, 1L, 20L, 1L, 1L, 20L, 6L, 8L, 8L, 5L, 5L, 6L, 4L, 4L, 8L, 8L, 5L, 5L, 9L, 9L, 10L, 10L, 12L, 12L, 13L, 13L, 22L, 22L, 23L, 23L, 24L, 24L, 25L, 25L, 25L, 25L, 40L, 40L, 41L, 41L, 38L, 38L, 39L, 39L, 14L, 14L, 14L, 15L, 15L, 28L, 28L, 29L, 29L, 35L, 35L, 36L, 36L, 37L, 37L, 42L, 42L, 43L, 43L, 44L, 44L, 45L, 45L, 46L, 46L, 47L, 47L, 2L, 1L, 3L, 3L, 4L, 4L, 7L, 11L, 11L, 16L, 16L, 20L, 20L, 21L, 21L, 17L, 17L, 18L, 18L, 19L, 19L, 26L, 26L, 27L, 27L, 30L, 30L, 31L, 31L, 32L, 32L, 33L, 33L, 34L, 34L, 48L), .Label = c("Tree_112", "Tree_112 ", "Tree_115", "Tree_130", "Tree_137", "Tree_139", "Tree_140", "Tree_141", "Tree_153", "Tree_154", "Tree_156", "Tree_159", "Tree_166", "Tree_169", "Tree_171", "Tree_180", "Tree_182", "Tree_184", "Tree_185", "Tree_202", "Tree_213", "Tree_218", "Tree_222", "Tree_227", "Tree_239", "Tree_242", "Tree_246", "Tree_247", "Tree_252", "Tree_260", "Tree_267", "Tree_269", "Tree_271", "Tree_272", "Tree_291", "Tree_293", "Tree_298", "Tree_327", "Tree_329", "Tree_336", "Tree_350", "Tree_401", "Tree_403", "Tree_405", "Tree_407", "Tree_409", "Tree_420", "Tree_851"), class = "factor"), city = structure(c(2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 1L, 1L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 2L, 2L, 3L, 3L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("Atlanta", "Augusta", "Jacksonville"), class = "factor"), plot = structure(c(2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("High-N", "Low-N"), class = "factor"), width = c(0.66, 0.716, 0.682, 0.645, 0.645, 0.696, 0.733,
0.707, 0.668, 0.686, 0.617, 0.733, 0.73, 0.615, 0.669, 0.746, 0.687, 0.682, 0.76, 0.713, 0.651, 0.664, 0.679, 0.729, 0.756,
0.669, 0.647, 0.713, 0.767, 0.685, 0.69, 0.731, 0.781, 0.729,
0.725, 0.739, 0.769, 0.791, 0.676, 0.688, 0.719, 0.753, 0.748,
0.791, 0.785, 0.78, 0.723, 0.756, 0.664, 0.645, 0.653, 0.615,
0.591, 0.642, 0.693, 0.716, 0.694, 0.676, 0.662, 0.629, 0.665,
0.748, 0.726, 0.693, 0.715, 0.714, 0.764, 0.732, 0.61, 0.721,
0.703, 0.713, 0.746, 0.752, 0.662, 0.733, 0.707, 0.674, 0.734,
0.79, 0.732, 0.794, 0.703, 0.712, 0.737, 0.731, 0.747, 0.746,
0.787, 0.709, 0.716, 0.764, 0.77, 0.764, 0.802, 0.663, 0.777,
0.642, 0.779, 0.81, 0.724, 0.645, 0.68, 0.637, 0.695, 0.768,
0.761, 0.7, 0.759, 0.726, 0.696, 0.794, 0.774, 0.799, 0.747,
0.606, 0.691, 0.733, 0.707, 0.698, 0.706, 0.72, 0.694, 0.697,
0.737, 0.716, 0.73, 0.706, 0.667, 0.734, 0.528, 0.695, 0.684,
0.763, 0.733, 0.809, 0.6, 0.676, 0.718, 0.759, 0.609, 0.665,
0.667, 0.647, 0.701, 0.663, 0.688, 0.693, 0.899)), .Names = c("tree.id", "city", "plot", "width"), class = "data.frame", row.names = c(NA, -149L))
Thank you all so much for your comments, I sincerely appreciate everyone's help!
Upvotes: 0
Views: 1192
Reputation: 126
leaf.width.model <- lme(width ~ city*plot, (1|tree.id), data=leaf)
In this model if you want to plot something, you are probably trying to answer: How much is the average leaf width for all trees in each city for each type of plot.
To show this information in a figure, you need to plot width
on y axis plot plot
(high and low nitrogen) on x axis and group the data by city
. Then you will get the 3 lines you are taking about. However, you need to get the average width in each group as you only want to show city variation.
To get this plot from raw data: (Using fake data provided by gfgm)
set.seed(100)
leaf <- data.frame(tree.id = rep(1:50, each = 3),
city = rep(c("Jackson", "Augusta", "Atlanta"), each = 50),
plot = rep(c(1, 0), each = 25))
# I'll make the average of width different for each plot
leaf$width <- rnorm(nrow(leaf), leaf$plot, 1)
library(plotly)
library(tidyverse)
leaf %>%
group_by(city,plot) %>%
summarise(avwidth = mean(width, na.rm=T),
avsd = 1.96*sd(width, na.rm=T)/sqrt(25)) %>%
plot_ly(x = ~plot, y = ~avwidth, color= ~city,
type="scatter", mode="markers+lines",
error_y = ~list(array=avsd)
)
Upvotes: 0
Reputation: 3647
As suggested in comments, a line plot might not make sense for your data, as you are studying how width varies in discrete categories (in separate cities and separate plots). Boxplots would make sense as you can make them for each of the interactions of city and plot. To give you a sense of what you can do I generated some fake data and made an example of the sort of plot that might be helpful to you:
# fake data
leaf <- data.frame(tree.id = rep(1:50, each = 3),
city = rep(c("Jackson", "Augusta", "Atlanta"), each = 50),
plot = rep(1:6, each = 25))
# I'll make the average of width different for each plot
leaf$width <- rnorm(nrow(leaf), leaf$plot, 1)
# plotting the data
library(ggplot2) # this is a great library for plotting in R
ggplot(leaf, aes(x = factor(plot), y = width, color = factor(plot))) +
facet_grid(~city, scales = 'free_x') + # This creates a subplot for each city
geom_boxplot() +
geom_point(position = "jitter") +
theme_bw()
In this plot I added the points (the leaf widths for each individual tree) but I 'jittered' them, meaning perturbing their position slightly so that they do not pile up on top of each other and are all visible. You could remove this if you liked.
Exploratory data analysis should be fun! And I think visualization is a good place to start when beginning in statistics. Hopefully this will prove helpful to you.
Upvotes: 1