Reputation: 51
I have a dataset in long format in which each participant undergoes two conditions in an experiment (repeated measures) and each condition is composed of a number of trials. Participants scored (Score
) per condition/group, but also have individual reaction times (RT
) per trial.
The dataset looks like this:
library(tidyverse)
df <- data.frame(ID = c(rep(1, 6), rep(2, 6), rep(3, 6)),
Gender = factor(c(rep("M", 6), rep("M", 6), rep("F", 6))),
Group = factor(c(rep(c(rep(0, 3), rep(1, 3)), 3))),
Trial = factor(rep(c(1:3), 6)),
Score = c(rep(10, 3), rep(20, 3), rep(15, 3), rep(25, 3), rep(18, 3), rep(12, 3)),
RT = runif(18)
)
I wanted to do some plotting to explore the data and focus on the analysis of the score, which is simpler at this stage. The problem I have is that each row in Score
is not really representing a single case, as it is RT
the one that are somehow "leads" the row division of the dataset. To be clear, my problem is that if for example I want to plot a bar with the counts per case of Gender
I would end up with a sum of 18 cases and not 3, as there are in reality.
ggplot(data=df, aes(Gender)) +
geom_bar()
I thought that a way to simplify the dataset could be that each RT row represents the mean/median per participant already, but this would involved subdividing my dataset in two and I prefer that this is the last option. In addition, this would not solve my problem as there will be two Gender
per participant.
I know this has to be simple, but I am having trouble formulating this issue as I am still a newbie in R.
I appreciate any help!
Upvotes: 0
Views: 225
Reputation: 389145
Since you have multiple rows for each ID
to count the gender keep only unique values for each ID
and Gender
before plotting. So you get something like this :
library(dplyr)
library(ggplot2)
df %>% distinct(ID, Gender) %>% ggplot(aes(Gender)) + geom_bar()
Upvotes: 1