Reputation: 35
I have a dataset with something like that, below is a small part. How to use barplot to calculate to probability of raining by month?
Date Rain Today
2020-01-01 Yes
2020-01-02 No
2020-01-03 Yes
2020-01-04 Yes
2020-01-05 No
... ...
2020-12-31 Yes
Upvotes: 0
Views: 630
Reputation: 75
EDIT: Correct answer in the comments
I dont know why you would want to use a scatterplot for this, but, from this post, you can use dplyr
pipelines to do something like this:
library(dplyr)
df %>%
group_by(month = format(Date, "%Y-%m")) %>%
summarise(probability = mean(`Rain Today` == 'Yes'))
To group your data into months and find out how many days it has rained/not rained. Then you find the mean of how many days it has rained.
Thank you everyone in the comments for pointing it out. I hope this helps
Upvotes: 1
Reputation: 365
The lubridate
package has some great functions that help you deal with dates.
install.packages("lubridate")
df$month <- lubridate::month(df$Date)
tapply(df[,"Rain Today"]=="Yes", df$month, mean)
You may need to execute df$Date <- as.Date(as.Date)
first if it's currently stored as character
rather than a date.
If you don't want to have any dependencies, then I think you can get what you want like this:
df$month <- substr(df$Date, start=6, stop=7) #Get the 6th and 7th characters of your date strings, which correspond to the "month" part
tapply(df[,"Rain Today"]=="Yes", df$month, mean)
Upvotes: 0