CISCO
CISCO

Reputation: 539

Sum only numerical columns and divide values by total

I am having trouble with some calculations on a data frame.

Here is my DF (with many more rows and columns)

What I am trying to do is:

Step (1) - For each month I would like to sum the assets columns. For example all the assets_Jan_2018 and output as a DF.

Step (2) - Then, for each firm, I would like to divide assets each month by the total for the month, i.e. the values calculated in step (1). For example, Firm A Jan_2018 = 210234/1466742 = 0.14334

Step (3) - Then I would like to take the values from step (2) and multiply by the corresponding returns. =0.143334*4.5 = 0.645003

Step (4) - Finally, I would like to sum each column in step (3) and output as a DF

 Firm   Assets_Jan_2018  Assets_ Feb_2018 Returns_Jan_2018 Returns_Feb_2018
 A          210234             235425           4.5            6.7
 B          123144             127124           5.3            1.2
 c          897897             789798           1.4            6.2
 D          235467             897342           9.7            3.2

I tried colSums and many other things but kept getting error messages.

Upvotes: 1

Views: 497

Answers (2)

Ronak Shah
Ronak Shah

Reputation: 388982

We can first separate "Assets" and "Returns" columns and then use colSums and multiply values accordingly

asset_col <- grep("^Assets", names(df1))
return_col <- grep("^Returns", names(df1))
colSums(t(t(df1[asset_col])/colSums(df1[asset_col])) * df1[return_col])

#Returns_Jan_2018 Returns_Feb_2018 
#        3.504230         4.633941 

To break it down and make clear each step

Step 1 - For each month I would like to sum the assets columns

colSums(df1[asset_col])
#Assets_Jan_2018 Assets_Feb_2018 
#    1466742         2049689 

Step 2 - For each firm, I would like to divide assets each month by the total for the month

t(t(df1[asset_col])/colSums(df1[asset_col]))
#     Assets_Jan_2018 Assets_Feb_2018
#[1,]      0.14333400      0.11485889
#[2,]      0.08395751      0.06202112
#[3,]      0.61217106      0.38532577
#[4,]      0.16053744      0.43779422

Step 3 - Then I would like to take the values from step (2) and multiply by the corresponding returns

t(t(df1[asset_col])/colSums(df1[asset_col])) * df1[return_col]

#  Returns_Jan_2018 Returns_Feb_2018
#1        0.6450030       0.76955455
#2        0.4449748       0.07442534
#3        0.8570395       2.38901980
#4        1.5572131       1.40094151

Step 4 - I would like to sum each column in step (3)

colSums(t(t(df1[asset_col])/colSums(df1[asset_col])) * df1[return_col])

#Returns_Jan_2018 Returns_Feb_2018 
#        3.504230         4.633941 

Upvotes: 3

hello_friend
hello_friend

Reputation: 5788

Probably a lot less concise then Ronak's answer, but demonstrates the sweep functionality.

Construct example df:

df <- data.frame(cbind(Firm = c("A", "B", "C", "D"),
                       Assets_Jan_2018 = as.numeric(c(210234, 123144, 897897, 235467)),
                       Assets_Feb_2018 = as.numeric(c(235425, 127124, 789798, 897342)),
                       Returns_Jan_2018 = as.double(c(4.5,  5.3,  1.4, 9.7)),
                       Returns_Feb_2019 = as.double(c(6.7, 1.2, 6.2, 3.2))))

Ensure numeric data types:

df <- type.convert(df)

Calculate the weighted returns:

FirmWeightedReturns <- cbind(Firm = df$Firm,
                             sweep(df[sapply(df, is.numeric) & !(grepl("returns", tolower(colnames(df))))],
                                   2,
                                   as.numeric(sapply(df[sapply(df, is.numeric) & !(grepl("returns", tolower(colnames(df))))], sum)), '/')
                                  * df[grepl("returns", tolower(colnames(df)))])

Summarise as a df:

TotalReturns <- data.frame(lapply(FirmWeightedReturns[sapply(FirmWeightedReturns, is.numeric)], sum))

Output to console:

TotalReturns

Upvotes: 1

Related Questions