Reputation: 584
I have a data frame that looks like this:
df = data.frame(animals = c("cat; dog; bird", "dog; bird", "bird"), sentences = c("the cat is brown; the dog is barking; the bird is green and blue", "the bird is yellow and blue", "the bird is blue"),year= c("2010","2012","2001"), stringsAsFactors = F)
df$year <- as.numeric(df$year)
> df
animals sentences year
1 cat; dog bird the cat is brown; the bird is green and blue 2010
2 dog; bird the dog is black; the bird is yellow and blue 2012
3 bird the bird is blue 2001
I would like to get the sum of animals inside the column sentences from the previous 5 years including the same year.
Edit
For example: in row2 the animals dog and bird, are repeated 3 times in the sentences column in the previous 5 years (including the same year) = year 2012: the dog is black; the bird is yellow and blue, and year 2010: the bird is green and blue, for a total of SUM = 3.
Desired Outcome
# A tibble: 3 x 4
animals sentences year SUM
<chr> <chr> <dbl> <int>
1 cat; dog; bird the cat is brown; the bird is green and blue 2010 2
2 dog; bird the dog is black; the bird is yellow and blue 2012 3
3 bird the bird is blue 2001 1
Solution
I have used the following code from here and added a logical operator:
animals[(year>=year-5) & (year<=year)]
, but it does not give me my desired output. What am I doing wrong?
string <- unlist(str_split(df$sentences, ";"))
df %>% rowwise %>%
mutate(SUM = str_split(animals[(year>=year-5) & (year<=year)], "; ", simplify = T) %>%
map( ~ str_count(string, .)) %>%
unlist %>% sum)
Any help would be much appreciated :) .
Upvotes: 0
Views: 776
Reputation: 14774
Try:
library(dplyr)
df %>%
mutate(SUM = sapply(strsplit(animals, "; "), length),
SUM = sapply(year, function(x) sum(SUM[between(year, x - 5 + 1, x)])))
This is the output:
animals sentences year SUM
1 cat; dog; bird the cat is brown; the dog is barking; the bird is green and blue 2010 3
2 dog; bird the dog is black; the bird is yellow and blue 2018 2
3 bird the bird is blue 2001 1
Of course in 2010
it doesn't correspond to your desired output as you haven't provided the data for before.
Upvotes: 2