help me
help me

Reputation: 35

How to find probability of dataset in R

I have a dataset with something like that, below is a small part. How to use barplot to calculate to probability of raining by month?

Date           Rain Today
2020-01-01     Yes
2020-01-02     No
2020-01-03     Yes
2020-01-04     Yes
2020-01-05     No
...            ...
2020-12-31     Yes

Upvotes: 0

Views: 630

Answers (2)

Heavy Breathing Cat
Heavy Breathing Cat

Reputation: 75

EDIT: Correct answer in the comments

I dont know why you would want to use a scatterplot for this, but, from this post, you can use dplyr pipelines to do something like this:

library(dplyr)

df %>% 
  group_by(month = format(Date, "%Y-%m")) %>%
  summarise(probability = mean(`Rain Today` == 'Yes'))

To group your data into months and find out how many days it has rained/not rained. Then you find the mean of how many days it has rained.

Thank you everyone in the comments for pointing it out. I hope this helps

Upvotes: 1

David
David

Reputation: 365

The lubridate package has some great functions that help you deal with dates.

install.packages("lubridate")
df$month <- lubridate::month(df$Date)
tapply(df[,"Rain Today"]=="Yes", df$month, mean)

You may need to execute df$Date <- as.Date(as.Date) first if it's currently stored as characterrather than a date.

If you don't want to have any dependencies, then I think you can get what you want like this:

df$month <- substr(df$Date, start=6, stop=7) #Get the 6th and 7th characters of your date strings, which correspond to the "month" part
tapply(df[,"Rain Today"]=="Yes", df$month, mean)

Upvotes: 0

Related Questions