JVDeasyas123
JVDeasyas123

Reputation: 273

Is there an equivalent to a "count if" function in R?

This is likely a very simple question, but I am trying to create a for loop function that would 1) find the percentage of "yes" from a column of "yes"/"no" for each variable in my df and 2) and then add that answer to a list to be used in a graph later.

So for example

j <- c("yes", "no", "no", "yes")
v <- c("no", "no", "no", "yes")
d <- c("yes", "no", "yes", "yes")

df <- data.frame(j,v,d)
df

frequency <- list()

for (i in df){
  if(i == "Checked") {
    freq <- #countif(df$i == "yes")/count(df, i)
frequency[[paste0("element", i)]] <- freq
  }
}

I believe I am very far off as 1) I do not know how to count specific values only in a colulumn and 2) I am not able to add the output to a list.

My ideal output would be a list of fractions of "yes"/"total" for each of the variables in df. I want to then use the frequencies and variable names to create a bar chart of their frequencies (not directly related to this question, but just some context as to why I am trying to do accomplish this task.

Thank you for your help!

Upvotes: 2

Views: 347

Answers (3)

ThomasIsCoding
ThomasIsCoding

Reputation: 102625

Another base R option is using prop.table, e.g.,

Map(function(x) prop.table(table(x))["yes"],df)

which gives

$j
yes
0.5

$v
 yes
0.25

$d
 yes
0.75

Upvotes: 0

akrun
akrun

Reputation: 887851

In base R, we can do this directly instead of a loop i.e. get the column means of the logical matrix (df == 'yes')

colMeans(df == 'yes')
#   j    v    d 
#0.50 0.25 0.75 

If we need a list as output, wrap with as.list

as.list(colMeans(df == 'yes'))
#$j
#[1] 0.5

#$v
#[1] 0.25

#$d
#[1] 0.75

Or with dplyr

library(dplyr)
df %>% 
     summarise(across(everything(), ~ mean(. == 'yes')))
#    j    v    d
#1 0.5 0.25 0.75

Or using a for loop

frequency <- vector('list', ncol(df))
names(frequency) <- paste0("element", seq_along(frequency))
for(i in seq_along(df)) frequency[[i]] <- mean(df[[i]] == "yes")
frequency
#$element1
#[1] 0.5

#$element2
#[1] 0.25

#$element3
#[1] 0.75

Upvotes: 0

Duck
Duck

Reputation: 39613

I think you are looking for this. In order to get your loop working is better if you move using a column index. You can then count the elements equal to Yes using length() and which() and divide by the total elements of each column using nrow(). Here the code:

#Data
j <- c("yes", "no", "no", "yes")
v <- c("no", "no", "no", "yes")
d <- c("yes", "no", "yes", "yes")
df <- data.frame(j,v,d,stringsAsFactors = F)
df
#List
frequency <- list()
#Loop
for (i in 1:ncol(df)){
    freq <- length(which(df[,i] == "yes"))/nrow(df[,i,drop=F])
    frequency[[paste0("element", i)]] <- freq
}

Output:

frequency
$element1
[1] 0.5

$element2
[1] 0.25

$element3
[1] 0.75

Upvotes: 1

Related Questions