Reputation: 505
I am trying to calculate the number of non NA values in a column and then aggregate that number with an adjacent date time column. For example, taking this data set (let's call it df):
## DateTime V1 V2 V3
01 02-10-2018 fire 1 NA
02 02-10-2018 water 4 NA
03 02-10-2018 fire 2 NA
04 02-10-2018 fire 2 NA
05 02-10-2018 water 8 NA
06 02-11-2018 water NA NA
07 02-11-2018 fire 4 NA
08 02-12-2018 earth 4 NA
09 02-13-2018 fire NA NA
10 02-13-2018 fire NA NA
11 02-13-2018 fire 4 NA
And I want to get out a table in this format:
## DateTime V1 V2 V3
01 02-10-2018 5 5 0
02 02-11-2018 2 1 0
03 02-12-2018 1 1 0
04 02-13-2018 3 1 0
I have tried several solution, but what keeps throwing me off is that I have a row full of NA value that throws out an empty data sets and that keeps messing up the code.
Attempted Solutions:
data.frame( table (df$DateTime, df$V1))
And
df%>%
select(df$DateTime,df$V1)%>%
filter(!is.na(df$V1))%>%
group_by(df$DateTime)%>%
mutate(V1.count = n())%>%
slice(1)
Upvotes: 1
Views: 89
Reputation: 32558
aggregate(df1[-1], df1[1], function(x) sum(!is.na(x)))
#OR
aggregate(df1[!names(df1) %in% "DateTime"], df1["DateTime"], function(x) sum(!is.na(x)))
# DateTime V1 V2 V3
#1 02-10-2018 5 5 0
#2 02-11-2018 2 1 0
#3 02-12-2018 1 1 0
#4 02-13-2018 3 1 0
DATA
df1 = structure(list(DateTime = c("02-10-2018", "02-10-2018", "02-10-2018",
"02-10-2018", "02-10-2018", "02-11-2018", "02-11-2018", "02-12-2018",
"02-13-2018", "02-13-2018", "02-13-2018"), V1 = c("fire", "water",
"fire", "fire", "water", "water", "fire", "earth", "fire", "fire",
"fire"), V2 = c(1L, 4L, 2L, 2L, 8L, NA, 4L, 4L, NA, NA, 4L),
V3 = c(NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA)), .Names = c("DateTime",
"V1", "V2", "V3"), class = "data.frame", row.names = c("01",
"02", "03", "04", "05", "06", "07", "08", "09", "10", "11"))
Upvotes: 2
Reputation: 2650
this is one solution:
df %>%
group_by(DateTime) %>%
summarise(V1 = sum(!is.na(V1)),
V2 = sum(!is.na(V2)),
V3 = sum(!is.na(V3)))
Upvotes: 0
Reputation: 3311
You can use summarise_at
from dplyr
:
library(dplyr)
count_nas <- function(x) sum(!is.na(x))
my_df %>%
group_by(V2) %>%
summarise_at(vars(V3:V5), count_nas)
# # A tibble: 4 x 4
# V2 V3 V4 V5
# <chr> <int> <int> <int>
# 1 02-10-2018 5 5 0
# 2 02-11-2018 2 1 0
# 3 02-12-2018 1 1 0
# 4 02-13-2018 3 1 0
A more concise version is to simply create the function inline:
my_df %>%
group_by(V2) %>%
summarise_at(vars(V3:V5), funs(sum(!is.na(.))))
Upvotes: 3