RmyjuloR
RmyjuloR

Reputation: 438

ggplot2: fail replicating plot from 2 datasets

I'm trying to replicate the last plot from this example: https://www.r-bloggers.com/plotting-individual-observations-and-group-means-with-ggplot2/

I succeed when I use the same code and data from the example. However, if I'm trying it on my own it won't work.

My original data is a longformat:

> head(summpas)
id session paradigm  N    mean        sd       se min firstq median  thirdq  max
1  1      s1 baseline 20  831.00  692.7155 154.8959  95 326.50  585.5 1327.50 2433
2  1      s1    post1 20 1344.65 1261.5589 282.0931 107 315.25 1008.5 2105.00 4621
3  1      s1    post2 20 1058.05  856.6661 191.5564 105 144.50 1064.0 1915.25 2427
4  1      s1    post3 20 1318.00 1016.1804 227.2248  95 381.75 1289.5 1741.50 3688
6  1      s2 baseline 20 1058.20 1118.8923 250.1919  10 131.00  314.5 1984.25 3042
7  1      s2    post1 20 1909.65 1478.1206 330.5178  59 760.50 1465.0 2808.00 4602

Summarizing like stated in the example doesn't work for me:

> meansummpas <- summpas %>%
    group_by(session, paradigm) %>% 
    summarise(mean = mean(mean))
> meansummpas
mean
1 949.5366

So therefore I use:

library(plyr)
meansummpas <- ddply(summpas, c("session", "paradigm"), summarise, 
mean=mean(mean))

Now I try the plot:

library(ggplot2)
ggplot(summpas, aes(x=paradigm, y=mean, group=id, colour=session)) + geom_line(aes(group=session), alpha=.3) + geom_line(data=meansummpas, alpha=.8, size=3)

But I get the error:

Don't know how to automatically pick scale for object of type 
tbl_df/tbl/data.frame. Defaulting to continuous.
Error: Aesthetics must be either length 1 or the same as the data (8): x, y, 
group, colour

What I have noticed is that the data from the example and my data are not exactly the same class (this also accounts for the unsummarized data):

class(gd)
[1] "grouped_df" "tbl_df"     "tbl"        "data.frame"
class(meansummpas)
[1] "data.frame"

Why do I get this error? What am I doing wrong? :) Many thanks!!

Upvotes: 0

Views: 62

Answers (1)

MrGumble
MrGumble

Reputation: 5776

It's unclear what you are plotting, as you already seem to have summarised each paradigm/session.

It appears the issue with 'one observation per group' appears because your x-variable is categorial; this apparently implies a grouping.

But I did manage to get a plot that averages across all means, but I had to add row 8 and 9.

summpas <- read.table(text='id session paradigm  N    mean        sd       se min firstq median  thirdq  max
1  1      s1 baseline 20  831.00  692.7155 154.8959  95 326.50  585.5 1327.50 2433
2  1      s1    post1 20 1344.65 1261.5589 282.0931 107 315.25 1008.5 2105.00 4621
3  1      s1    post2 20 1058.05  856.6661 191.5564 105 144.50 1064.0 1915.25 2427
4  1      s1    post3 20 1318.00 1016.1804 227.2248  95 381.75 1289.5 1741.50 3688
6  1      s2 baseline 20 1058.20 1118.8923 250.1919  10 131.00  314.5 1984.25 3042
7  1      s2    post1 20 1909.65 1478.1206 330.5178  59 760.50 1465.0 2808.00 4602
8  1      s2    post2 20 1060.20 1118.8923 250.1919  10 131.00  314.5 1984.25 3042
9  1      s2    post3 20 1980.20 1118.8923 250.1919  10 131.00  314.5 1984.25 3042
', header=TRUE, as.is=TRUE)
ggplot(summpas, aes(x=paradigm, y=mean)) + geom_path(aes(colour=session, group=session)) +
  stat_summary(fun.y=mean, geom='line', aes(group=NA))

Upvotes: 1

Related Questions