Reputation: 2556
I have a time series data which looks as below.
keyword byMonth n_views
business tax preparation software Dec-2016 5
corporate income tax solution Nov-2016 3
corporate income tax solution Mar-2017 2
corporate tax provision Dec-2016 5
corporate tax provision Oct-2016 1
data collection Mar-2017 39
data collection May-2017 26
data collection Apr-2017 22
data collection Feb-2017 15
data collection Jan-2017 15
data collection Nov-2016 13
data collection Dec-2016 7
data collection Oct-2016 6
I want to select only those keywords
that are throughout Oct-2016 to May-2017 using dplyr or any other convenient method. So in this case, only observations associated with Keyword: data collection
should be the output. I am having a hard time figuring this out. Thanks a ton in advance.
Upvotes: 0
Views: 460
Reputation: 39154
We can use functions from dplyr
and tidyr
. The keyword in dt2
are the cases with complete month coverage.
library(dplyr)
library(tidyr)
dt2 <- dt %>%
# Spread the data frame
spread(byMonth, n_views) %>%
# Filter rows without any NA
filter(rowSums(!is.na(.)) == ncol(.))
If the original format is needed, we can use gather
to convert it back.
dt3 <- dt2 %>%
gather(byMonth, n_views, -keyword)
dt <- read.table(text = " keyword byMonth n_views
'business tax preparation software' 'Dec-2016' 5
'corporate income tax solution' 'Nov-2016' 3
'corporate income tax solution' 'Mar-2017' 2
'corporate tax provision' 'Dec-2016' 5
'corporate tax provision' 'Oct-2016' 1
'data collection' 'Mar-2017' 39
'data collection' 'May-2017' 26
'data collection' 'Apr-2017' 22
'data collection' 'Feb-2017' 15
'data collection' 'Jan-2017' 15
'data collection' 'Nov-2016' 13
'data collection' 'Dec-2016' 7
'data collection' 'Oct-2016' 6",
header = TRUE, stringsAsFactors = FALSE)
Upvotes: 2