marta t
marta t

Reputation: 21

Count categorical variable ("yes") for each column

Can you please help me count how many YES answers are for each ingredient

I have a data set:

beef  beet_broth  beef_liver  beer  chicken
Yes      Yes         No       Yes    No
No       Yes         No       Yes    No
No       No          Yes      Yes    No
Yes      Yes         No       Yes    No

I would like to know the sum of YES in each column, if 0 then won't appear in results:

Beef - 2 
Beef_broth - 3
Beef_liver - 1
Beer - 4 

I have data set: 384 columns, 57 691 rows

Upvotes: 2

Views: 14445

Answers (2)

Rich Scriven
Rich Scriven

Reputation: 99331

We can use colSums to find the number of "Yes" values per column (because TRUE equates to 1 and FALSE to zero), then subset for the values greater than zero.

cs <- colSums(recipes == "Yes")
cs[cs > 0]
#    beef beet_broth beef_liver       beer 
#       2          3          1          4 

Upvotes: 4

John Coleman
John Coleman

Reputation: 51998

There is probably a more elegant way using plyr, but the following seems to be what you want:

> yesses = sapply(recipes,FUN = function(x){length(x[x=="Yes"])})
> yesses
      beef beet_broth beef_liver       beer    chicken 
         2          3          1          4          0 
> yesses[yesses > 0]
      beef beet_broth beef_liver       beer 
         2          3          1          4 

On Edit. How it works: A dataframe is a list of column vectors. sapply takes a list and a function and applies the function across the list, returning the results as a vector. In the above I used an anonymous function which uses logical subsetting to take a column and extract the entries which equal "Yes". The length of the resulting subvector is the desired count. You could first define this function like thus:

countYes = function(v){length(v[v=="Yes"])}

And then define yesses as:

yesses = sapply(recipes,countYes)

which works exactly as above.

Disclaimer: I'm relatively new to R myself but have a lot of experience with Python. I typically think how I would solve a problem using a Python list comprehension and then paraphrase it in R, which typically involves some combination of subsetting and functions in the apply family. The resulting code works as desired, but might not be very idiomatic.

Upvotes: 1

Related Questions