Reputation: 11
I have data on groups of bonds and their cash flows. Each IssueID contains multiple BondIDs. Each BondID has its own cash flow, rendered as a list within the data frame via dplyr mutate. The cash flow lists do not have equal numbers of elements. The data structure is:
IssueID | BondID | cashflow |
---|---|---|
AA | AA1 | c(-1000, 50, 50, 1050) |
AA | AA2 | c(-1000, 25, 25, 25, 25, 1025) |
AB | AB1 | c(-2000, 100, 100, 2100) |
AB | AB1 | c(-1000, 75, 75, 75, 75, 1075) |
I need to sum each BondID's cashflows by IssueID while maintaining each element's position in the list. The output needs to look like:
IssueID | sumcashflow |
---|---|
AA | c(-2000, 75, 75, 1075, 25, 1025) |
AB | c(-3000, 175, 175, 2175, 75, 1075) |
Appreciate any assistance. Thanks.
Am not able to sum the lists.
Upvotes: 1
Views: 57
Reputation: 24742
You can write a small function, and apply it to each IssueID
f <- function(cf) {
ml = lengths(cf)
for(i in seq_along(ml)) {
if(length(cf[[i]])<max(ml)) cf[[i]]=c(cf[[i]],rep(0,max(ml)-length(cf[[i]])))
}
list(rowSums(matrix(unlist(cf),ncol=length(cf))))
}
Using dplyr
:
d %>% group_by(IssueID) %>% summarize(sumcashflow = f(cashflow))
Using data.table
:
setDT(d)[,.(f(cashflow)), by=IssueID]
Output:
IssueID sumcashflow
1: AA -2000, 75, 75, 1075, 25, 1025
2: AB -3000, 175, 175, 2175, 75, 1075
Input:
d = data.table(
IssueID = c("AA","AA","AB","AB"),
BondID = c("AA1", "AA2","AB1", "AB2"),
cashflow = list(c(-1000,50,50,1050),
c(-1000,25,25,25,25,1025),
c(-2000,100,100,2100),
c(-1000,75,75,75,75,1075))
Upvotes: 1
Reputation: 79288
df1 %>%
group_by(IssueID) %>%
summarise(val = list(colSums(do.call(qpcR:::rbind.na, cashflow), na.rm = TRUE)))
# A tibble: 2 x 2
IssueID val
<chr> <list>
1 AA <dbl [6]>
2 AB <dbl [6]>
val:
[[1]]
[1] -2000 75 75 1075 25 1025
[[2]]
[1] -3000 175 175 2175 75 1075
Upvotes: 1
Reputation: 887501
Here is one option - grouped by 'IssueID', convert the cashflow
list to a named list
with sequence (row_number()
), then convert to a tibble using enframe
, unnest
the list
column, create a grouping by the rowid
of the 'name' (in case there are unequal lengths), and get the sum
of 'value', pull
the column as a list
library(dplyr)
library(tibble)
library(tidyr)
library(data.table)
out <- df1 %>%
group_by(IssueID) %>%
summarise(sumcashflow = setNames(cashflow, row_number()) %>%
enframe %>%
unnest(value) %>%
group_by(grp = rowid(name)) %>%
summarise(value = sum(value, na.rm = TRUE)) %>%
pull(value) %>%
list(.))
-output
> out$sumcashflow
[[1]]
[1] -2000 75 75 1075 25 1025
[[2]]
[1] -3000 175 175 2175 75 1075
> out
# A tibble: 2 × 2
IssueID sumcashflow
<chr> <list>
1 AA <dbl [6]>
2 AB <dbl [6]>
or using base R
with split
lst1 <- lapply(split(df1$cashflow, df1$IssueID), \(x) {
mx <- max(lengths(x))
rowSums(sapply(x, `length<-`, mx), na.rm = TRUE)
})
> lst1
$AA
[1] -2000 75 75 1075 25 1025
$AB
[1] -3000 175 175 2175 75 1075
df1 <- structure(list(IssueID = c("AA", "AA", "AB", "AB"), BondID = c("AA1",
"AA2", "AB1", "AB1"), cashflow = list(c(-1000, 50, 50, 1050),
c(-1000, 25, 25, 25, 25, 1025), c(-2000, 100, 100, 2100),
c(-1000, 75, 75, 75, 75, 1075))), row.names = c(NA, -4L), class = c("tbl_df",
"tbl", "data.frame"))
Upvotes: 1