Reputation: 21
Can you please help me count how many YES answers are for each ingredient
I have a data set:
beef beet_broth beef_liver beer chicken
Yes Yes No Yes No
No Yes No Yes No
No No Yes Yes No
Yes Yes No Yes No
I would like to know the sum of YES in each column, if 0 then won't appear in results:
Beef - 2
Beef_broth - 3
Beef_liver - 1
Beer - 4
I have data set: 384 columns, 57 691 rows
Upvotes: 2
Views: 14445
Reputation: 99331
We can use colSums
to find the number of "Yes"
values per column (because TRUE
equates to 1 and FALSE
to zero), then subset for the values greater than zero.
cs <- colSums(recipes == "Yes")
cs[cs > 0]
# beef beet_broth beef_liver beer
# 2 3 1 4
Upvotes: 4
Reputation: 51998
There is probably a more elegant way using plyr
, but the following seems to be what you want:
> yesses = sapply(recipes,FUN = function(x){length(x[x=="Yes"])})
> yesses
beef beet_broth beef_liver beer chicken
2 3 1 4 0
> yesses[yesses > 0]
beef beet_broth beef_liver beer
2 3 1 4
On Edit. How it works: A dataframe is a list of column vectors. sapply
takes a list and a function and applies the function across the list, returning the results as a vector. In the above I used an anonymous function which uses logical subsetting to take a column and extract the entries which equal "Yes". The length of the resulting subvector is the desired count. You could first define this function like thus:
countYes = function(v){length(v[v=="Yes"])}
And then define yesses
as:
yesses = sapply(recipes,countYes)
which works exactly as above.
Disclaimer: I'm relatively new to R myself but have a lot of experience with Python. I typically think how I would solve a problem using a Python list comprehension and then paraphrase it in R, which typically involves some combination of subsetting and functions in the apply
family. The resulting code works as desired, but might not be very idiomatic.
Upvotes: 1