Reputation: 273
This is likely a very simple question, but I am trying to create a for loop function that would 1) find the percentage of "yes" from a column of "yes"/"no" for each variable in my df and 2) and then add that answer to a list to be used in a graph later.
So for example
j <- c("yes", "no", "no", "yes")
v <- c("no", "no", "no", "yes")
d <- c("yes", "no", "yes", "yes")
df <- data.frame(j,v,d)
df
frequency <- list()
for (i in df){
if(i == "Checked") {
freq <- #countif(df$i == "yes")/count(df, i)
frequency[[paste0("element", i)]] <- freq
}
}
I believe I am very far off as 1) I do not know how to count specific values only in a colulumn and 2) I am not able to add the output to a list.
My ideal output would be a list of fractions of "yes"/"total" for each of the variables in df. I want to then use the frequencies and variable names to create a bar chart of their frequencies (not directly related to this question, but just some context as to why I am trying to do accomplish this task.
Thank you for your help!
Upvotes: 2
Views: 347
Reputation: 102625
Another base R option is using prop.table
, e.g.,
Map(function(x) prop.table(table(x))["yes"],df)
which gives
$j
yes
0.5
$v
yes
0.25
$d
yes
0.75
Upvotes: 0
Reputation: 887851
In base R
, we can do this directly instead of a loop i.e. get the column means of the logical matrix
(df == 'yes'
)
colMeans(df == 'yes')
# j v d
#0.50 0.25 0.75
If we need a list
as output, wrap with as.list
as.list(colMeans(df == 'yes'))
#$j
#[1] 0.5
#$v
#[1] 0.25
#$d
#[1] 0.75
Or with dplyr
library(dplyr)
df %>%
summarise(across(everything(), ~ mean(. == 'yes')))
# j v d
#1 0.5 0.25 0.75
Or using a for
loop
frequency <- vector('list', ncol(df))
names(frequency) <- paste0("element", seq_along(frequency))
for(i in seq_along(df)) frequency[[i]] <- mean(df[[i]] == "yes")
frequency
#$element1
#[1] 0.5
#$element2
#[1] 0.25
#$element3
#[1] 0.75
Upvotes: 0
Reputation: 39613
I think you are looking for this. In order to get your loop working is better if you move using a column index. You can then count the elements equal to Yes
using length()
and which()
and divide by the total elements of each column using nrow()
. Here the code:
#Data
j <- c("yes", "no", "no", "yes")
v <- c("no", "no", "no", "yes")
d <- c("yes", "no", "yes", "yes")
df <- data.frame(j,v,d,stringsAsFactors = F)
df
#List
frequency <- list()
#Loop
for (i in 1:ncol(df)){
freq <- length(which(df[,i] == "yes"))/nrow(df[,i,drop=F])
frequency[[paste0("element", i)]] <- freq
}
Output:
frequency
$element1
[1] 0.5
$element2
[1] 0.25
$element3
[1] 0.75
Upvotes: 1