Reputation: 47
I am currently analyzing data pulled from twitter in R. The tweets are from different users written in different periods of time (collecting data within one year per user). I want to plot the data using a dictionary but therefore I need to unify the time range of my data.
For simplicity I have created two dataframes to explain what I am looking for. This is what my data frame currently looks like (only much more data):
Author <- rep(c("Person1"), times = 4)
Text <- c("Thanks","for","your", "help")
Date <- as.Date(c('2015-01-03','2015-01-15','2015-02-16','2015-02-20'))
Pers1 <- data.frame(Author,Text,Date)
Author <- rep(c("Person2"), times = 4)
Text <- c("I","really","appreciate", "it")
Date <- as.Date(c('2020-08-03','2020-08-15','2020-09-16','2020-09-20'))
Pers2 <- data.frame(Author,Text,Date)
DF <- bind_rows(Pers1, Pers2)
Person 1 wrote two tweets in January (first month of data collection) and another two in February (Second month of data collection). Person 2 started tweeting in August five years later (First month of data collection) and september (second month). In the end I want the data frame to look like this:
Author Text Date Period
1 Person1 Thanks 2015-01-03 First Month
2 Person1 for 2015-01-15 First Month
3 Person1 your 2015-02-16 Second Month
4 Person1 help 2015-02-20 Second Month
5 Person2 I 2020-08-03 First Month
6 Person2 really 2020-08-15 First Month
7 Person2 appreciate 2020-09-16 Second Month
8 Person2 it 2020-09-20 Second Month
I have already tried multiple things but nothing worked out the way I wanted. Thank you in advance for all suggestions!
Upvotes: 0
Views: 136
Reputation: 66870
We could do this by joining your data to a summarized version of itself where each month is given a Period
label.
# add month
DF <- bind_rows(Pers1, Pers2) %>%
mutate(month = lubridate::floor_date(Date, "month"))
DF %>%
left_join( # join to summarized data
DF %>%
distinct(Author, month = lubridate::floor_date(Date, "month")) %>%
group_by(Author) %>%
mutate(Period = paste("Month", row_number()))
) %>%
select(-month)
Joining, by = c("Author", "month")
Author Text Date Period
1 Person1 Thanks 2015-01-03 Month 1
2 Person1 for 2015-01-15 Month 1
3 Person1 your 2015-02-16 Month 2
4 Person1 help 2015-02-20 Month 2
5 Person2 I 2020-08-03 Month 1
6 Person2 really 2020-08-15 Month 1
7 Person2 appreciate 2020-09-16 Month 2
8 Person2 it 2020-09-20 Month 2
Upvotes: 0
Reputation: 7415
You can use dplyr
and lubridate
:
DF %>%
group_by(Author) %>%
mutate(Period = case_when(month(Date) == min(month(Date)) ~ "First Month",
TRUE ~ "Second Month"))
The above assumes that you only have two months maximum. If you wanted to expand this to n
number of months, you could build out the case_when
statement by adding 1, 2, etc:
DF %>%
group_by(Author) %>%
mutate(Period = case_when(month(Date) == min(month(Date)) ~ "First Month",
month(Date) == min(month(Date))+1 ~ "Second Month"))
Both give us:
# A tibble: 8 × 4
# Groups: Author [2]
Author Text Date Period
<chr> <chr> <date> <chr>
1 Person1 Thanks 2015-01-03 First Month
2 Person1 for 2015-01-15 First Month
3 Person1 your 2015-02-16 Second Month
4 Person1 help 2015-02-20 Second Month
5 Person2 I 2020-08-03 First Month
6 Person2 really 2020-08-15 First Month
7 Person2 appreciate 2020-09-16 Second Month
8 Person2 it 2020-09-20 Second Month
Upvotes: 0