Reputation: 47

Creating column in data frame depending on time ranges in another column

I am currently analyzing data pulled from twitter in R. The tweets are from different users written in different periods of time (collecting data within one year per user). I want to plot the data using a dictionary but therefore I need to unify the time range of my data.

For simplicity I have created two dataframes to explain what I am looking for. This is what my data frame currently looks like (only much more data):

Author <- rep(c("Person1"), times = 4)
Text <- c("Thanks","for","your", "help")
Date <- as.Date(c('2015-01-03','2015-01-15','2015-02-16','2015-02-20'))
Pers1 <- data.frame(Author,Text,Date)

Author <- rep(c("Person2"), times = 4)
Text <- c("I","really","appreciate", "it")
Date <- as.Date(c('2020-08-03','2020-08-15','2020-09-16','2020-09-20'))
Pers2 <- data.frame(Author,Text,Date)

DF <- bind_rows(Pers1, Pers2)

Person 1 wrote two tweets in January (first month of data collection) and another two in February (Second month of data collection). Person 2 started tweeting in August five years later (First month of data collection) and september (second month). In the end I want the data frame to look like this:

Author       Text       Date       Period
1 Person1     Thanks 2015-01-03  First Month
2 Person1        for 2015-01-15  First Month
3 Person1       your 2015-02-16 Second Month
4 Person1       help 2015-02-20 Second Month
5 Person2          I 2020-08-03  First Month
6 Person2     really 2020-08-15  First Month
7 Person2 appreciate 2020-09-16 Second Month
8 Person2         it 2020-09-20 Second Month

I have already tried multiple things but nothing worked out the way I wanted. Thank you in advance for all suggestions!

Upvotes: 0

Answers (2)

Jon Spring

Reputation: 66870

We could do this by joining your data to a summarized version of itself where each month is given a Period label.

# add month
DF <- bind_rows(Pers1, Pers2) %>%
  mutate(month = lubridate::floor_date(Date, "month")) 

DF %>%
  left_join(  # join to summarized data
    DF %>%
      distinct(Author, month = lubridate::floor_date(Date, "month")) %>%
      group_by(Author) %>%
      mutate(Period = paste("Month", row_number()))
  ) %>%
  select(-month)


Joining, by = c("Author", "month")
   Author       Text       Date  Period
1 Person1     Thanks 2015-01-03 Month 1
2 Person1        for 2015-01-15 Month 1
3 Person1       your 2015-02-16 Month 2
4 Person1       help 2015-02-20 Month 2
5 Person2          I 2020-08-03 Month 1
6 Person2     really 2020-08-15 Month 1
7 Person2 appreciate 2020-09-16 Month 2
8 Person2         it 2020-09-20 Month 2

Upvotes: 0

Matt

Reputation: 7415

You can use dplyr and lubridate:

DF %>% 
  group_by(Author) %>% 
  mutate(Period = case_when(month(Date) == min(month(Date)) ~ "First Month",
                            TRUE ~ "Second Month"))

The above assumes that you only have two months maximum. If you wanted to expand this to n number of months, you could build out the case_when statement by adding 1, 2, etc:

DF %>% 
  group_by(Author) %>% 
  mutate(Period = case_when(month(Date) == min(month(Date)) ~ "First Month",
                            month(Date) == min(month(Date))+1 ~ "Second Month"))

Both give us:

# A tibble: 8 × 4
# Groups:   Author [2]
  Author  Text       Date       Period      
  <chr>   <chr>      <date>     <chr>       
1 Person1 Thanks     2015-01-03 First Month 
2 Person1 for        2015-01-15 First Month 
3 Person1 your       2015-02-16 Second Month
4 Person1 help       2015-02-20 Second Month
5 Person2 I          2020-08-03 First Month 
6 Person2 really     2020-08-15 First Month 
7 Person2 appreciate 2020-09-16 Second Month
8 Person2 it         2020-09-20 Second Month

Upvotes: 0

Creating column in data frame depending on time ranges in another column

Answers (2)

Related Questions