Reputation: 1613
I have a dataframe with 213 rows and 2 columns (Date and Article). The final aim is to reduce the number of rows by grouping the Date by quarterly. Obviously, I want texts from the Article column to be merged accordingly.
Let's take an example.
Date <- c("2000-01-05", "2000-02-03", "2000-03-02", "2000-03-30", "2000-04-13", "2000-05-11", "2000-06-08", "2000-07-06", "2000-09-14", "2000-10-05", "2000-10-19", "2000-11-02", "2000-12-14")
Article <- c("Long Text","Long Text","Long Text","Long Text","Long Text","Long Text","Long Text","Long Text","Long Text","Long Text","Long Text","Long Text","Long Text")
Date <- data.frame(Date)
Article <- data.frame(Article)
df <- cbind(Date, Article)
#Dataframe
Date Article
1 2000-01-05 Long Text
2 2000-02-03 Long Text
3 2000-03-02 Long Text
4 2000-03-30 Long Text
5 2000-04-13 Long Text
6 2000-05-11 Long Text
7 2000-06-08 Long Text
8 2000-07-06 Long Text
9 2000-09-14 Long Text
10 2000-10-05 Long Text
11 2000-10-19 Long Text
12 2000-11-02 Long Text
13 2000-12-14 Long Text
The final output I would like to obtain is the following:
Date Article
1 2000 Q1 Long Text, Long Text, Long Text, Long Text
2 2000 Q2 Long Text, Long Text, Long Text
3 2000 Q3 Long Text, Long Text
4 2000 Q4 Long Text, Long Text, Long Text, Long Text
Essentially, the rows have been grouped together by quarters as well as the corresponding text.
I tried to look around but I have no idea how to do it, unofrtunately.
Can anyone help me out?
Thanks!
Upvotes: 2
Views: 596
Reputation: 5798
Base R solution:
# Row-wise concatenate Article vec by the group of year & qtr:
aggregate(list(Article = df$Article),
by = list(Date = paste(gsub("[-].*", "", df$Date), quarters(df$Date), sep = " ")),
paste, sep = ", ")
Data:
df <- data.frame(Date = as.Date(c("2000-01-05",
"2000-02-03",
"2000-03-02",
"2000-03-30", "2000-04-13", "2000-05-11", "2000-06-08",
"2000-07-06", "2000-09-14", "2000-10-05", "2000-10-19",
"2000-11-02", "2000-12-14"),
"%Y-%m-%d"),
Article = c("Long Text","Long Text","Long Text","Long Text","Long Text","Long Text","Long Text","Long Text","Long Text","Long Text","Long Text","Long Text","Long Text"))
Upvotes: 1
Reputation: 887771
We can use as.yearqtr
from zoo
to do summarization
library(zoo)
library(data.table)
setDT(df)[, .(Article = toString(Article)),.(Date = as.yearqtr(as.IDate(Date)))]
# Date Article
#1: 2000 Q1 Long Text, Long Text, Long Text, Long Text
#2: 2000 Q2 Long Text, Long Text, Long Text
#3: 2000 Q3 Long Text, Long Text
#4: 2000 Q4 Long Text, Long Text, Long Text, Long Text
Upvotes: 2
Reputation: 40171
One dplyr
and lubridate
option could be:
df %>%
group_by(Date = as.character(lubridate::quarter(ymd(Date), with_year = TRUE))) %>%
summarise(Article = paste0(Article, collapse = ","))
Date Article
<chr> <chr>
1 2000.1 Long Text,Long Text,Long Text,Long Text
2 2000.2 Long Text,Long Text,Long Text
3 2000.3 Long Text,Long Text
4 2000.4 Long Text,Long Text,Long Text,Long Text
Upvotes: 3