Reputation: 145
I have a CSV like this, saved as an object in R named df1.
X Y Z Year
0 2 4 2014
3 1 3 2014
5 4 0 2014
0 3 0 2014
2 1 0 2015
I want to:
Count each column where there are no "0" for year 2014. For example, for column X, the count = 2 (not 3 because I want 2014 data only). For column Y the count is 4. For column Z the count is 1.
Sum all the counts for each column
This is what I tried:
count_total <- sum(df1$x != 0 &
df1$y != 0 &
df1&z != 0 &
df1$Year == 2014)
count_total
I want the output to be simply be 1 (i.e. the 2nd row in df has no 0's)
However, this does not align with my countifs on excel. In excel, it's like this:
=COUNTIFS('df1'!$A$2:$A$859,"<>0",'df1'!$B$2:$B$859,"<>0",
'df1'!$C$2:$C$859,"<>0",'df1'!$D$2:$D$859,2014)
Wondering if I mistyped something on R? I'm a dyplr user but can't find anything particularly useful on google.
Thank you very much!
Upvotes: 2
Views: 1290
Reputation: 2246
dplyr
approach:
library(dplyr)
df1 %>%
group_by(Year) %>%
summarise_at(vars(X:Z), function (x) sum(x != 0))
Output:
# A tibble: 2 x 4
# Year X Y Z
# <int> <int> <int> <int>
# 1 2014 2 4 2
# 2 2015 1 1 0
Upvotes: 3
Reputation: 4970
Alternative using summaryBy
.
library(doBy)
summaryBy(list(c('X','Y','Z'), c('Year')), df1, FUN= function(x) sum(x!=0), keep.names=T)
Year X Y Z
1 2014 2 4 2
2 2015 1 1 0
When needed use colSums
as explained before.
Upvotes: 2
Reputation: 76402
You can do this with aggregate
then colSums
to get the totals by column.
agg <- aggregate(. ~ Year, df1, function(x) sum(x != 0))
agg
# Year X Y Z
#1 2014 2 4 2
#2 2015 1 1 0
colSums(agg[-1])
#X Y Z
#3 5 2
Data.
df1 <- read.table(text = "
X Y Z Year
0 2 4 2014
3 1 3 2014
5 4 0 2014
0 3 0 2014
2 1 0 2015
",header = TRUE)
Upvotes: 3
Reputation: 388817
One way is using rowSums
on subset
of data
sum(rowSums(subset(df1, Year == 2014) == 0) == 0)
#[1] 1
Upvotes: 4