stevewilko
stevewilko

Reputation: 11

R: Sum Two Lists Embedded in Rows of a Dataframe by Groups

I have data on groups of bonds and their cash flows. Each IssueID contains multiple BondIDs. Each BondID has its own cash flow, rendered as a list within the data frame via dplyr mutate. The cash flow lists do not have equal numbers of elements. The data structure is:

IssueID BondID cashflow
AA AA1 c(-1000, 50, 50, 1050)
AA AA2 c(-1000, 25, 25, 25, 25, 1025)
AB AB1 c(-2000, 100, 100, 2100)
AB AB1 c(-1000, 75, 75, 75, 75, 1075)

I need to sum each BondID's cashflows by IssueID while maintaining each element's position in the list. The output needs to look like:

IssueID sumcashflow
AA c(-2000, 75, 75, 1075, 25, 1025)
AB c(-3000, 175, 175, 2175, 75, 1075)

Appreciate any assistance. Thanks.

Am not able to sum the lists.

Upvotes: 1

Views: 57

Answers (3)

langtang
langtang

Reputation: 24742

You can write a small function, and apply it to each IssueID

f <- function(cf) {
  ml = lengths(cf)
  for(i in seq_along(ml)) {
    if(length(cf[[i]])<max(ml)) cf[[i]]=c(cf[[i]],rep(0,max(ml)-length(cf[[i]])))
  }
  list(rowSums(matrix(unlist(cf),ncol=length(cf))))
}

Using dplyr:

d %>% group_by(IssueID) %>% summarize(sumcashflow = f(cashflow))

Using data.table:

setDT(d)[,.(f(cashflow)), by=IssueID]

Output:

   IssueID                         sumcashflow
1:      AA -2000,   75,   75, 1075,   25, 1025
2:      AB -3000,  175,  175, 2175,   75, 1075

Input:

d = data.table(
    IssueID = c("AA","AA","AB","AB"),
    BondID = c("AA1", "AA2","AB1", "AB2"),
    cashflow = list(c(-1000,50,50,1050),
                    c(-1000,25,25,25,25,1025),
                    c(-2000,100,100,2100),
                    c(-1000,75,75,75,75,1075))

Upvotes: 1

Onyambu
Onyambu

Reputation: 79288

df1 %>%
  group_by(IssueID) %>%
  summarise(val = list(colSums(do.call(qpcR:::rbind.na, cashflow), na.rm = TRUE)))

# A tibble: 2 x 2
  IssueID val      
  <chr>   <list>   
1 AA      <dbl [6]>
2 AB      <dbl [6]>

val:

[[1]]
[1] -2000    75    75  1075    25  1025

[[2]]
[1] -3000   175   175  2175    75  1075

Upvotes: 1

akrun
akrun

Reputation: 887501

Here is one option - grouped by 'IssueID', convert the cashflow list to a named list with sequence (row_number()), then convert to a tibble using enframe, unnest the list column, create a grouping by the rowid of the 'name' (in case there are unequal lengths), and get the sum of 'value', pull the column as a list

library(dplyr)
library(tibble)
library(tidyr)
library(data.table)
out <- df1 %>%
  group_by(IssueID) %>%
  summarise(sumcashflow = setNames(cashflow, row_number()) %>%
         enframe %>%
         unnest(value) %>%
         group_by(grp = rowid(name)) %>%
         summarise(value = sum(value, na.rm = TRUE)) %>%
         pull(value) %>%
        list(.))

-output

> out$sumcashflow
[[1]]
[1] -2000    75    75  1075    25  1025

[[2]]
[1] -3000   175   175  2175    75  1075

> out
# A tibble: 2 × 2
  IssueID sumcashflow
  <chr>   <list>     
1 AA      <dbl [6]>  
2 AB      <dbl [6]>  

or using base R with split

lst1 <- lapply(split(df1$cashflow, df1$IssueID), \(x) {
      mx <- max(lengths(x))
     rowSums(sapply(x, `length<-`, mx), na.rm = TRUE)
})

> lst1
$AA
[1] -2000    75    75  1075    25  1025

$AB
[1] -3000   175   175  2175    75  1075

data

df1 <- structure(list(IssueID = c("AA", "AA", "AB", "AB"), BondID = c("AA1", 
"AA2", "AB1", "AB1"), cashflow = list(c(-1000, 50, 50, 1050), 
    c(-1000, 25, 25, 25, 25, 1025), c(-2000, 100, 100, 2100), 
    c(-1000, 75, 75, 75, 75, 1075))), row.names = c(NA, -4L), class = c("tbl_df", 
"tbl", "data.frame"))

Upvotes: 1

Related Questions