Reputation: 47
I asked a similar question before but realized that I wasn't specific enough. I am currently analyzing data pulled from twitter in R. The tweets are from different users written in different periods of time (collecting data within one year per user). I want to plot the data using a dictionary but therefore I need to unify the time range of my data.
For simplicity I have created two dataframes to explain what I am looking for. This is what my data frame currently looks like (only much more data):
Author <- rep(c("Person1"), times = 7)
Text <- c("A","B","C", "D", "E", "F", "G")
Date <- as.Date(c('2015-01-15','2015-01-23','2015-02-14','2015-02-20', '2015-02-25', '2015-03-04', '2015-04-20'))
Pers1 <- data.frame(Author,Text,Date)
Author <- rep(c("Person2"), times = 7)
Text <- c("H","I","J", "K", "L", "M", "N")
Date <- as.Date(c('2020-08-10','2020-08-15','2020-09-05','2020-09-20', '2020-09-30', '2020-10-15','2020-10-25'))
Pers2 <- data.frame(Author,Text,Date)
DF <- bind_rows(Pers1, Pers2)
E.g. I am looking at tweets from Person 1 from Jan. 15 2015 to Jan 15 2016. First month of observation (Jan 15 to Feb 15) should be referred to as first month and so on (until 12th month)
Observation for Person2 starts from August 10th (First month until Sep 10th, Second Month from Sept 10th to October 10th...)
In the end I want the data frame to look like this:
> DF
Author Text Date Period
1 Person1 A 2015-01-15 First Month
2 Person1 B 2015-01-23 First Month
3 Person1 C 2015-02-14 First Month
4 Person1 D 2015-02-20 Second Month
5 Person1 E 2015-02-25 Second Month
6 Person1 F 2015-03-04 Second Month
7 Person1 G 2015-04-20 Third Month
8 Person2 H 2020-08-10 First Month
9 Person2 I 2020-08-15 First Month
10 Person2 J 2020-09-05 First Month
11 Person2 K 2020-09-20 Second Month
12 Person2 L 2020-09-30 Second Month
13 Person2 M 2020-10-15 Third Month
14 Person2 N 2020-10-25 Third Month
Maybe I have to prep each data frame before combining them to a large data frame but I don't know how to do it. Thank you in advance for all suggestions.
Upvotes: 3
Views: 314
Reputation: 5897
code
library(lubridate)
DF %>%
group_by(Author) %>%
mutate(Period = 1 + (interval(first(Date), Date) %/% months(1)))
results
Author Text Date Period
<fct> <fct> <date> <dbl>
1 Person1 A 2015-01-15 1
2 Person1 B 2015-01-23 1
3 Person1 C 2015-02-14 1
4 Person1 D 2015-02-20 2
5 Person1 E 2015-02-25 2
6 Person1 F 2015-03-04 2
7 Person1 G 2015-04-20 4
8 Person2 H 2020-08-10 1
9 Person2 I 2020-08-15 1
10 Person2 J 2020-09-05 1
11 Person2 K 2020-09-20 2
12 Person2 L 2020-09-30 2
13 Person2 M 2020-10-15 3
14 Person2 N 2020-10-25 3
Upvotes: 2
Reputation: 52319
Using MESS::cumsumbinning
library(dplyr)
DF %>%
group_by(Author) %>%
mutate(Month = MESS::cumsumbinning(c(0,diff(Date - first(Date))), 30, cutwhenpassed = F))
Author Text Date Month
<chr> <chr> <date> <int>
1 Person1 A 2015-01-15 1
2 Person1 B 2015-01-23 1
3 Person1 C 2015-02-14 1
4 Person1 D 2015-02-20 2
5 Person1 E 2015-02-25 2
6 Person1 F 2015-03-04 2
7 Person1 G 2015-04-20 3
8 Person2 H 2020-08-10 1
9 Person2 I 2020-08-15 1
10 Person2 J 2020-09-05 1
11 Person2 K 2020-09-20 2
12 Person2 L 2020-09-30 2
13 Person2 M 2020-10-15 3
14 Person2 N 2020-10-25 3
To get your expected result, you can use english::ordinal
:
library(english)
library(tidyverse)
library(MESS)
DF %>%
group_by(Author) %>%
mutate(Month = MESS::cumsumbinning(c(0,diff(Date - first(Date))), 30, cutwhenpassed = F) %>%
ordinal() %>%
paste(., "Month") %>%
stringr::str_to_title()
)
Author Text Date Month
<chr> <chr> <date> <chr>
1 Person1 A 2015-01-15 First Month
2 Person1 B 2015-01-23 First Month
3 Person1 C 2015-02-14 First Month
4 Person1 D 2015-02-20 Second Month
5 Person1 E 2015-02-25 Second Month
6 Person1 F 2015-03-04 Second Month
7 Person1 G 2015-04-20 Third Month
8 Person2 H 2020-08-10 First Month
9 Person2 I 2020-08-15 First Month
10 Person2 J 2020-09-05 First Month
11 Person2 K 2020-09-20 Second Month
12 Person2 L 2020-09-30 Second Month
13 Person2 M 2020-10-15 Third Month
14 Person2 N 2020-10-25 Third Month
Upvotes: 0
Reputation: 174393
You could do something like this:
library(dplyr)
months_since_start <- function(dates, start_date) {
floor(as.numeric(difftime(dates, start_date, unit = "week")) / 4.33) + 1
}
DF %>%
group_by(Author) %>%
mutate(month = months_since_start(Date, first(Date)))
#> # A tibble: 14 x 4
#> # Groups: Author [2]
#> Author Text Date month
#> <chr> <chr> <date> <dbl>
#> 1 Person1 A 2015-01-15 1
#> 2 Person1 B 2015-01-23 1
#> 3 Person1 C 2015-02-14 1
#> 4 Person1 D 2015-02-20 2
#> 5 Person1 E 2015-02-25 2
#> 6 Person1 F 2015-03-04 2
#> 7 Person1 G 2015-04-20 4
#> 8 Person2 H 2020-08-10 1
#> 9 Person2 I 2020-08-15 1
#> 10 Person2 J 2020-09-05 1
#> 11 Person2 K 2020-09-20 2
#> 12 Person2 L 2020-09-30 2
#> 13 Person2 M 2020-10-15 3
#> 14 Person2 N 2020-10-25 3
Upvotes: 0