Reputation: 740
I am facing the following problem. I have a list of M&A transactions, each transactions includes data on (1) acquirer, (2) vendor, (3) target. The data is structured in a way that the relationship can be n:n:n and looks similar to the following:
dealid acquirer target vendor
1 FirmA FirmB FirmC
1 FirmD FirmE
2 .....................
So the problem is that the rows within a deal have no meaning per se, so e.g., FirmD is also a co-acquirer for FirmB.
I now need to create all possible acquirer-target-vendor combinations within each dealid. I have managed to expand grids with the expand.grid
function or simply via merge
. However, I do not know how to expand a grid of all possible combinations within groups.
Upvotes: 4
Views: 3678
Reputation: 4187
With split
as @Sotos mentioned in the comments:
l1 <- split(df1, df1$dealid)
l2 <- lapply(l1, function(x) unique(with(x, expand.grid(acquirer, na.omit(target), vendor))))
df2 <- cbind.data.frame(dealid = rep(names(l2), sapply(l2, nrow)), do.call(rbind, l2))
This results in:
> df2
dealid Var1 Var2 Var3
1.1 1 FirmA FirmB FirmC
1.2 1 FirmD FirmB FirmC
1.3 1 FirmA FirmB FirmE
1.4 1 FirmD FirmB FirmE
2.1 2 FirmA FirmF FirmC
2.2 2 FirmD FirmF FirmC
2.3 2 FirmG FirmF FirmC
2.4 2 FirmA FirmF FirmE
2.5 2 FirmD FirmF FirmE
2.6 2 FirmG FirmF FirmE
Upvotes: 0
Reputation: 107567
Consider base R's by
, the function that slices a dataframe by factor grouping (dealid) allowing extended iterative operations such as expand.grid
to return a list of dataframes. Below uses same data sample as @PLapointe and @akrun:
dfList <- by(df, df$dealid, function(i){
tmp <- cbind(dealid=max(i$dealid),
expand.grid(acquirer=i$acquirer, target=i$target, vendor=i$vendor))
tmp[!is.na(tmp$target),]
})
newdf <- unique(do.call(rbind, dfList))
row.names(newdf) <- NULL
newdf
# dealid acquirer target vendor
# 1 1 FirmA FirmB FirmC
# 2 1 FirmD FirmB FirmC
# 3 1 FirmA FirmB FirmE
# 4 1 FirmD FirmB FirmE
# 5 2 FirmA FirmF FirmC
# 6 2 FirmD FirmF FirmC
# 7 2 FirmG FirmF FirmC
# 8 2 FirmA FirmF FirmE
# 9 2 FirmD FirmF FirmE
# 10 2 FirmG FirmF FirmE
Upvotes: 1
Reputation: 886948
We can use data.table
library(data.table)
setDT(df1)[, CJ(acquirer = acquirer, target = target, vendor = vendor,
unique = TRUE), dealid][!is.na(target)]
# dealid acquirer target vendor
#1: 1 FirmA FirmB FirmC
#2: 1 FirmA FirmB FirmE
#3: 1 FirmD FirmB FirmC
#4: 1 FirmD FirmB FirmE
#5: 2 FirmA FirmF FirmC
#6: 2 FirmA FirmF FirmE
#7: 2 FirmD FirmF FirmC
#8: 2 FirmD FirmF FirmE
#9: 2 FirmG FirmF FirmC
#10: 2 FirmG FirmF FirmE
df1 <- structure(list(dealid = c(1L, 1L, 2L, 2L, 2L), acquirer = c("FirmA",
"FirmD", "FirmA", "FirmD", "FirmG"), target = c("FirmB", NA,
NA, NA, "FirmF"), vendor = c("FirmC", "FirmE", "FirmC", "FirmE",
"FirmE")), .Names = c("dealid", "acquirer", "target", "vendor"
), class = "data.frame", row.names = c(NA, -5L))
Upvotes: 3
Reputation: 16277
You could do that with dplyr
and expand
from tidyr
.
df <- read.table(text="dealid acquirer target vendor
1 FirmA FirmB FirmC
1 FirmD NA FirmE
2 FirmA NA FirmC
2 FirmD NA FirmE
2 FirmG FirmF FirmE",header=TRUE,stringsAsFactors=FALSE)
library(dplyr);library(tidyr)
df%>%
group_by(dealid)%>%
expand(acquirer, target, vendor)
dealid acquirer target vendor
<int> <chr> <chr> <chr>
1 1 FirmA FirmB FirmC
2 1 FirmA FirmB FirmE
3 1 FirmD FirmB FirmC
4 1 FirmD FirmB FirmE
5 2 FirmA FirmF FirmC
6 2 FirmA FirmF FirmE
7 2 FirmD FirmF FirmC
8 2 FirmD FirmF FirmE
9 2 FirmG FirmF FirmC
10 2 FirmG FirmF FirmE
Upvotes: 7