Reputation: 3
This is the sample data.
created_date start_date
2014-12-11 2014-12-10
2014-12-11 2014-12-11
2014-12-12 2014-12-13
2014-12-13 NULL
2014-12-13 2014-12-13
2014-12-13 2014-12-13
2014-12-23 NULL
2014-12-23 NULL
I'd like to count how many start_date was checked each day, according to the created_date. The value of start_date is not important, only the 'number' of start_dates checked is meaningful.
In this case, the result of for loop should be like this
created_date count
2014-12-11 2
2014-12-12 1
2014-12-13 2
2014-12-23 0
I cannot simply use table() because:
table(created_date) will count created_date, not start_date.
>table(created_date)
created_date count
2014-12-11 2
2014-12-12 1
2014-12-13 3
2014-12-23 2
table(start_date) won't work either, since it doesn't count the created date of "NULL" and more importantly, the value of start_date itself is meaningless.
>table(start_date)
created_date count
2014-12-10 1
2014-12-11 1
2014-12-13 3
NULL 3
I guess for loop should be used, but don't have idea how to code that. Thanks in advance!
Upvotes: 0
Views: 64
Reputation: 545943
Short version: Use table
separately on the complete data and on the empty rows only, subtract second from first.
Long version:
Assuming your data is in x
(and that NULL
s are actually NA
s, see Gist for details):
Count the entries, and put them into data_frame
s for convenience:
library(dplyr)
all_counts = as_data_frame(table(x$created_date))
na_counts = as_data_frame(table(x[is.na(x$start_date), ]$created_date))
Subtract the na_counts
from the full_counts
. To do this, we
first need to join these two tables. Joining will introduce NA
s, which we
will replace by 0
s:
full_join(all_counts, na_counts, by = 'Var1') %>%
mutate(n.y = ifelse(is.na(n.y), 0, n.y)) %>%
mutate(count = n.x - n.y) %>% # And finally, subtract the counts.
select(created_date = Var1, count)
Result:
| created_date | count |
|:---------------|--------:|
| 2014-12-11 | 2 |
| 2014-12-12 | 1 |
| 2014-12-13 | 2 |
| 2014-12-23 | 0 |
Upvotes: 1