Reputation: 740
I have one dataframe (df1) that looks as follows. It indicates years when a company was active in a specific market.
Company Country Year
A Austria 2010
A Germany 2010
A Austria 2011
B Italy 2010
I now have a second dataframe (df2) that looks as follows. It lists all investments of a company in a country at a given time, by investment type as dummy variabes.
Company Country Year JointVenture M&A Greenfield
A Austria 2010 1 0 0
A Austria 2010 0 1 0
A Austria 2010 1 0 0
...
My question now is as follows: I want to add a new column to df1, including the "countif" of each investment type as indicated in df2. For instance, new df1:
Company Country Year Count.JointVenture Count.M&A Count.Greenfield
A Austria 2010 2 1 0
A Germany 2010 ...........
A Austria 2011
B Italy 2010
Also, how would I be able to then add new columns to df1 transforming these counts into dummy variables (1 if >0; 0 if 0)?
Thanks and sorry for this basic question, but I did not find a fitting solutions in existing threads.
Cheers, Martin
Upvotes: 0
Views: 1138
Reputation: 680
Using dplyr::summarise_each
and merge
with Martin's data.
df <- fread("Company Country Year
A Austria 2010
A Germany 2010
A Austria 2011
B Italy 2010")
df2 <- fread("Company Country Year JointVenture MA Greenfield
A Austria 2010 1 0 0
A Austria 2010 0 1 0
A Austria 2010 1 0 0")
library(dplyr)
df2 %>%
group_by(Company, Country, Year) %>%
summarise_each(funs(sum), JointVenture:Greenfield) %>%
full_join(df, by = c("Company", "Country", "Year")) -> df
edits: replaced a summarise
with a summarise_each
with input from @zacdav and replace the merge
by full_join
to stay in dplyr
Upvotes: 1
Reputation: 23879
I throw my data.table
attempt into the arena:
df <- fread("Company Country Year
A Austria 2010
A Germany 2010
A Austria 2011
B Italy 2010")
df2 <- fread("Company Country Year JointVenture M&A Greenfield
A Austria 2010 1 0 0
A Austria 2010 0 1 0
A Austria 2010 1 0 0")
setkey(df2, Company, Country, Year)
df2[,c("JointVenture", "M&A", "Greenfield") := .(sum(JointVenture), sum(`M&A`), sum(Greenfield)), by=.(Company, Country, Year)]
merge(x=df, y=unique(df2), by=c("Company", "Country", "Year"), all.x=T, all.y=F, suffixes = c("", "Count."))
Which results in
Company Country Year JointVenture M&A Greenfield
1: A Austria 2010 2 1 0
2: A Austria 2011 NA NA NA
3: A Germany 2010 NA NA NA
4: B Italy 2010 NA NA NA
Upvotes: 1
Reputation: 131
using aggregate() and ifelse() functions
# test data
df <- data.frame(Company = rep("A", 3),
Country = rep("Austria", 3),
Year = rep(2010, 3),
JointVenture = c(1,0,1),
MnA = c(0,1,0),
Greenfield = rep(0,3))
# this is the new df
counts <- aggregate(cbind(JointVenture, MnA, Greenfield)~Country+Company+Year, data = df, FUN = sum)
# dummy
counts$dummyJointVenture <- ifelse(counts$JointVenture > 0, 1, 0)
counts$dummyMnA <- ifelse(counts$MnA > 0, 1, 0)
counts$dummyGreenfield <- ifelse(counts$Greenfield > 0, 1, 0)
Upvotes: 2