Rollo99
Rollo99

Reputation: 1613

How to group rows in a dataframe by quarters?

I have a dataframe with 213 rows and 2 columns (Date and Article). The final aim is to reduce the number of rows by grouping the Date by quarterly. Obviously, I want texts from the Article column to be merged accordingly.

Let's take an example.

Date <- c("2000-01-05", "2000-02-03", "2000-03-02", "2000-03-30", "2000-04-13", "2000-05-11", "2000-06-08", "2000-07-06", "2000-09-14", "2000-10-05", "2000-10-19", "2000-11-02", "2000-12-14")
Article <- c("Long Text","Long Text","Long Text","Long Text","Long Text","Long Text","Long Text","Long Text","Long Text","Long Text","Long Text","Long Text","Long Text")

Date <- data.frame(Date)
Article <- data.frame(Article)

df <- cbind(Date, Article)

#Dataframe

Date           Article
1  2000-01-05 Long Text
2  2000-02-03 Long Text
3  2000-03-02 Long Text
4  2000-03-30 Long Text
5  2000-04-13 Long Text
6  2000-05-11 Long Text
7  2000-06-08 Long Text
8  2000-07-06 Long Text
9  2000-09-14 Long Text
10 2000-10-05 Long Text
11 2000-10-19 Long Text
12 2000-11-02 Long Text
13 2000-12-14 Long Text

The final output I would like to obtain is the following:

Date         Article
1  2000 Q1   Long Text, Long Text, Long Text, Long Text
2  2000 Q2   Long Text, Long Text, Long Text
3  2000 Q3   Long Text, Long Text
4  2000 Q4   Long Text, Long Text, Long Text, Long Text

Essentially, the rows have been grouped together by quarters as well as the corresponding text.

I tried to look around but I have no idea how to do it, unofrtunately.

Can anyone help me out?

Thanks!

Upvotes: 2

Views: 596

Answers (3)

hello_friend
hello_friend

Reputation: 5798

Base R solution:

# Row-wise concatenate Article vec by the group of year & qtr: 

aggregate(list(Article = df$Article),

          by = list(Date = paste(gsub("[-].*", "", df$Date), quarters(df$Date), sep = " ")),

          paste, sep = ", ")

Data:

df <- data.frame(Date = as.Date(c("2000-01-05",
                                   "2000-02-03",
                                   "2000-03-02",
                                   "2000-03-30", "2000-04-13", "2000-05-11", "2000-06-08",
                                   "2000-07-06", "2000-09-14", "2000-10-05", "2000-10-19",
                                   "2000-11-02", "2000-12-14"),
                                 "%Y-%m-%d"),
            Article = c("Long Text","Long Text","Long Text","Long Text","Long Text","Long Text","Long Text","Long Text","Long Text","Long Text","Long Text","Long Text","Long Text"))

Upvotes: 1

akrun
akrun

Reputation: 887771

We can use as.yearqtr from zoo to do summarization

library(zoo)
library(data.table)
setDT(df)[, .(Article = toString(Article)),.(Date = as.yearqtr(as.IDate(Date)))]
#   Date                                    Article
#1: 2000 Q1 Long Text, Long Text, Long Text, Long Text
#2: 2000 Q2            Long Text, Long Text, Long Text
#3: 2000 Q3                       Long Text, Long Text
#4: 2000 Q4 Long Text, Long Text, Long Text, Long Text

Upvotes: 2

tmfmnk
tmfmnk

Reputation: 40171

One dplyr and lubridate option could be:

df %>%
 group_by(Date = as.character(lubridate::quarter(ymd(Date), with_year = TRUE))) %>%
 summarise(Article = paste0(Article, collapse = ",")) 

  Date   Article                                
  <chr>  <chr>                                  
1 2000.1 Long Text,Long Text,Long Text,Long Text
2 2000.2 Long Text,Long Text,Long Text          
3 2000.3 Long Text,Long Text                    
4 2000.4 Long Text,Long Text,Long Text,Long Text

Upvotes: 3

Related Questions