Hanna Kowol
Hanna Kowol

Reputation: 47

How to create a column based on time ranges in another column irrespective of calender month in R

I asked a similar question before but realized that I wasn't specific enough. I am currently analyzing data pulled from twitter in R. The tweets are from different users written in different periods of time (collecting data within one year per user). I want to plot the data using a dictionary but therefore I need to unify the time range of my data.

For simplicity I have created two dataframes to explain what I am looking for. This is what my data frame currently looks like (only much more data):

Author <- rep(c("Person1"), times = 7)
Text <- c("A","B","C", "D", "E", "F", "G")
Date <- as.Date(c('2015-01-15','2015-01-23','2015-02-14','2015-02-20', '2015-02-25', '2015-03-04', '2015-04-20'))
Pers1 <- data.frame(Author,Text,Date)

Author <- rep(c("Person2"), times = 7)
Text <- c("H","I","J", "K", "L", "M", "N")
Date <- as.Date(c('2020-08-10','2020-08-15','2020-09-05','2020-09-20', '2020-09-30', '2020-10-15','2020-10-25'))
Pers2 <- data.frame(Author,Text,Date)

DF <- bind_rows(Pers1, Pers2)

E.g. I am looking at tweets from Person 1 from Jan. 15 2015 to Jan 15 2016. First month of observation (Jan 15 to Feb 15) should be referred to as first month and so on (until 12th month)

Observation for Person2 starts from August 10th (First month until Sep 10th, Second Month from Sept 10th to October 10th...)

In the end I want the data frame to look like this:

> DF
    Author Text       Date       Period
1  Person1    A 2015-01-15  First Month
2  Person1    B 2015-01-23  First Month
3  Person1    C 2015-02-14  First Month
4  Person1    D 2015-02-20 Second Month
5  Person1    E 2015-02-25 Second Month
6  Person1    F 2015-03-04 Second Month
7  Person1    G 2015-04-20  Third Month
8  Person2    H 2020-08-10  First Month
9  Person2    I 2020-08-15  First Month
10 Person2    J 2020-09-05  First Month
11 Person2    K 2020-09-20 Second Month
12 Person2    L 2020-09-30 Second Month
13 Person2    M 2020-10-15  Third Month
14 Person2    N 2020-10-25  Third Month

Maybe I have to prep each data frame before combining them to a large data frame but I don't know how to do it. Thank you in advance for all suggestions.

Upvotes: 3

Views: 314

Answers (3)

Merijn van Tilborg
Merijn van Tilborg

Reputation: 5897

code

library(lubridate)

DF %>%
  group_by(Author) %>%
  mutate(Period = 1 + (interval(first(Date), Date) %/% months(1)))

results

   Author  Text  Date       Period
   <fct>   <fct> <date>      <dbl>
 1 Person1 A     2015-01-15      1
 2 Person1 B     2015-01-23      1
 3 Person1 C     2015-02-14      1
 4 Person1 D     2015-02-20      2
 5 Person1 E     2015-02-25      2
 6 Person1 F     2015-03-04      2
 7 Person1 G     2015-04-20      4
 8 Person2 H     2020-08-10      1
 9 Person2 I     2020-08-15      1
10 Person2 J     2020-09-05      1
11 Person2 K     2020-09-20      2
12 Person2 L     2020-09-30      2
13 Person2 M     2020-10-15      3
14 Person2 N     2020-10-25      3

Upvotes: 2

Ma&#235;l
Ma&#235;l

Reputation: 52319

Using MESS::cumsumbinning

library(dplyr)

DF %>% 
  group_by(Author) %>% 
  mutate(Month = MESS::cumsumbinning(c(0,diff(Date - first(Date))), 30, cutwhenpassed = F))

   Author  Text  Date       Month
   <chr>   <chr> <date>     <int>
 1 Person1 A     2015-01-15     1
 2 Person1 B     2015-01-23     1
 3 Person1 C     2015-02-14     1
 4 Person1 D     2015-02-20     2
 5 Person1 E     2015-02-25     2
 6 Person1 F     2015-03-04     2
 7 Person1 G     2015-04-20     3
 8 Person2 H     2020-08-10     1
 9 Person2 I     2020-08-15     1
10 Person2 J     2020-09-05     1
11 Person2 K     2020-09-20     2
12 Person2 L     2020-09-30     2
13 Person2 M     2020-10-15     3
14 Person2 N     2020-10-25     3

To get your expected result, you can use english::ordinal:

library(english)
library(tidyverse)
library(MESS)
DF %>% 
  group_by(Author) %>% 
  mutate(Month = MESS::cumsumbinning(c(0,diff(Date - first(Date))), 30, cutwhenpassed = F) %>% 
                    ordinal() %>% 
                    paste(., "Month") %>% 
                    stringr::str_to_title()
         )

   Author  Text  Date       Month       
   <chr>   <chr> <date>     <chr>       
 1 Person1 A     2015-01-15 First Month 
 2 Person1 B     2015-01-23 First Month 
 3 Person1 C     2015-02-14 First Month 
 4 Person1 D     2015-02-20 Second Month
 5 Person1 E     2015-02-25 Second Month
 6 Person1 F     2015-03-04 Second Month
 7 Person1 G     2015-04-20 Third Month 
 8 Person2 H     2020-08-10 First Month 
 9 Person2 I     2020-08-15 First Month 
10 Person2 J     2020-09-05 First Month 
11 Person2 K     2020-09-20 Second Month
12 Person2 L     2020-09-30 Second Month
13 Person2 M     2020-10-15 Third Month 
14 Person2 N     2020-10-25 Third Month 

Upvotes: 0

Allan Cameron
Allan Cameron

Reputation: 174393

You could do something like this:

library(dplyr)

months_since_start <- function(dates, start_date) {
  floor(as.numeric(difftime(dates, start_date, unit = "week")) / 4.33) + 1
}

DF %>% 
  group_by(Author) %>% 
  mutate(month = months_since_start(Date, first(Date)))

#> # A tibble: 14 x 4
#> # Groups:   Author [2]
#>    Author  Text  Date       month
#>    <chr>   <chr> <date>     <dbl>
#>  1 Person1 A     2015-01-15     1
#>  2 Person1 B     2015-01-23     1
#>  3 Person1 C     2015-02-14     1
#>  4 Person1 D     2015-02-20     2
#>  5 Person1 E     2015-02-25     2
#>  6 Person1 F     2015-03-04     2
#>  7 Person1 G     2015-04-20     4
#>  8 Person2 H     2020-08-10     1
#>  9 Person2 I     2020-08-15     1
#> 10 Person2 J     2020-09-05     1
#> 11 Person2 K     2020-09-20     2
#> 12 Person2 L     2020-09-30     2
#> 13 Person2 M     2020-10-15     3
#> 14 Person2 N     2020-10-25     3

Upvotes: 0

Related Questions