Reputation: 539
I am having trouble with some calculations on a data frame.
Here is my DF (with many more rows and columns)
What I am trying to do is:
Step (1) - For each month I would like to sum the assets columns. For example all the assets_Jan_2018 and output as a DF.
Step (2) - Then, for each firm, I would like to divide assets each month by the total for the month, i.e. the values calculated in step (1). For example, Firm A Jan_2018 = 210234/1466742 = 0.14334
Step (3) - Then I would like to take the values from step (2) and multiply by the corresponding returns. =0.143334*4.5 = 0.645003
Step (4) - Finally, I would like to sum each column in step (3) and output as a DF
Firm Assets_Jan_2018 Assets_ Feb_2018 Returns_Jan_2018 Returns_Feb_2018
A 210234 235425 4.5 6.7
B 123144 127124 5.3 1.2
c 897897 789798 1.4 6.2
D 235467 897342 9.7 3.2
I tried colSums and many other things but kept getting error messages.
Upvotes: 1
Views: 497
Reputation: 388982
We can first separate "Assets"
and "Returns"
columns and then use colSums
and multiply values accordingly
asset_col <- grep("^Assets", names(df1))
return_col <- grep("^Returns", names(df1))
colSums(t(t(df1[asset_col])/colSums(df1[asset_col])) * df1[return_col])
#Returns_Jan_2018 Returns_Feb_2018
# 3.504230 4.633941
To break it down and make clear each step
Step 1 - For each month I would like to sum the assets columns
colSums(df1[asset_col])
#Assets_Jan_2018 Assets_Feb_2018
# 1466742 2049689
Step 2 - For each firm, I would like to divide assets each month by the total for the month
t(t(df1[asset_col])/colSums(df1[asset_col]))
# Assets_Jan_2018 Assets_Feb_2018
#[1,] 0.14333400 0.11485889
#[2,] 0.08395751 0.06202112
#[3,] 0.61217106 0.38532577
#[4,] 0.16053744 0.43779422
Step 3 - Then I would like to take the values from step (2) and multiply by the corresponding returns
t(t(df1[asset_col])/colSums(df1[asset_col])) * df1[return_col]
# Returns_Jan_2018 Returns_Feb_2018
#1 0.6450030 0.76955455
#2 0.4449748 0.07442534
#3 0.8570395 2.38901980
#4 1.5572131 1.40094151
Step 4 - I would like to sum each column in step (3)
colSums(t(t(df1[asset_col])/colSums(df1[asset_col])) * df1[return_col])
#Returns_Jan_2018 Returns_Feb_2018
# 3.504230 4.633941
Upvotes: 3
Reputation: 5788
Probably a lot less concise then Ronak's answer, but demonstrates the sweep functionality.
Construct example df:
df <- data.frame(cbind(Firm = c("A", "B", "C", "D"),
Assets_Jan_2018 = as.numeric(c(210234, 123144, 897897, 235467)),
Assets_Feb_2018 = as.numeric(c(235425, 127124, 789798, 897342)),
Returns_Jan_2018 = as.double(c(4.5, 5.3, 1.4, 9.7)),
Returns_Feb_2019 = as.double(c(6.7, 1.2, 6.2, 3.2))))
Ensure numeric data types:
df <- type.convert(df)
Calculate the weighted returns:
FirmWeightedReturns <- cbind(Firm = df$Firm,
sweep(df[sapply(df, is.numeric) & !(grepl("returns", tolower(colnames(df))))],
2,
as.numeric(sapply(df[sapply(df, is.numeric) & !(grepl("returns", tolower(colnames(df))))], sum)), '/')
* df[grepl("returns", tolower(colnames(df)))])
Summarise as a df:
TotalReturns <- data.frame(lapply(FirmWeightedReturns[sapply(FirmWeightedReturns, is.numeric)], sum))
Output to console:
TotalReturns
Upvotes: 1