Reputation: 447

Looping through Columns of Dataframe and dividing values by a value indexed by column name in another dataframe?

I have a data frame like below with lots of columns and data in

                 Label 2015-08-24 2015-08-25
   1               AxG   179238.8   174160.3
   2              CHaF   203544.2   199411.9
   3              HHHH   130138.9   129182.2
   4               fLS   146367.0   148708.0
   5               KSU   194584.7   188508.0
   6              MqCC   272922.7   273369.5
   7              BEPU        0.0        0.0
   8              SSEX   107566.4   111552.8
   9               QQQ   248380.4   257064.2
   10              AAA   258735.4   260415.5

I have another dataframe that has row index the same as the column names in the first (in this case dates). I want to divide every value in the column of the first dataframe with the corresponding value in the second

                      totals
      2015-08-24      1741479
      2015-08-25      199411.9

My first idea was to do something like below (I come from a pythonic background, albeit a while ago)

        > for(i in names(df1)){
         + df1[[df1[,i]/df2[i,]]]

But this is the error I receive:

Error in .subset2(x, i, exact = exact) : no such index at level 1 In addition: Warning message: In Ops.factor(weights[, i], weight_totals[i, ]) : ‘/’ not meaningful for factors

Any ideas?

Upvotes: 3

Answers (3)

mtoto

Reputation: 24178

You could transpose your second data.frame and multiply the matching columns using intersect()

dft <- data.frame(t(df2), check.names = FALSE) # transpose df2

mapply("/", df[intersect(names(df), names(dft))],
        dft[intersect(names(df), names(dft))]) # perfrom division
#      2015-08-24 2015-08-25
# [1,] 0.10292332  0.8733696
# [2,] 0.11688008  1.0000000
# [3,] 0.07472895  0.6478159
# [4,] 0.08404753  0.7457328
# [5,] 0.11173531  0.9453197
# [6,] 0.15671892  1.3708786
# [7,] 0.00000000  0.0000000
# [8,] 0.06176727  0.5594089
# [9,] 0.14262612  1.2891116
#[10,] 0.14857222  1.3059176

Data used:

df <- structure(list(Label = structure(c(2L, 4L, 6L, 5L, 7L, 8L, 3L, 
10L, 9L, 1L), .Label = c("AAA", "AxG", "BEPU", "CHaF", "fLS", 
"HHHH", "KSU", "MqCC", "QQQ", "SSEX"), class = "factor"), `2015-08-24` = c(179238.8, 
203544.2, 130138.9, 146367, 194584.7, 272922.7, 0, 107566.4, 
248380.4, 258735.4), `2015-08-25` = c(174160.3, 199411.9, 129182.2, 
148708, 188508, 273369.5, 0, 111552.8, 257064.2, 260415.5)), .Names = c("Label", 
"2015-08-24", "2015-08-25"), class = "data.frame", row.names = c("1", 
"2", "3", "4", "5", "6", "7", "8", "9", "10"))

df2 <- structure(list(totals = c(1741479, 199411.9)), .Names = "totals", row.names = c("2015-08-24", 
"2015-08-25"), class = "data.frame")

Upvotes: 4

Jaap

Reputation: 83215

Altough @PierreLafortune has the shortest solution, using sweep requires that the order of the column names of df and the rownames of df2 has to be the same. When using for (and I suppose also with the intersect method of @mtoto) the column names of df and the rownames of df2 don't have to be in the same order to get the desired results. Another advantage is that df is updated immediately:

for(i in names(df)[-1]) 
  df[, i] <- df[, i] / df2$totals[match(i, row.names(df2))]

which gives:

> df
   Label 2015-08-24 2015-08-25
1    AxG 0.10292332  0.8733696
2   CHaF 0.11688008  1.0000000
3   HHHH 0.07472895  0.6478159
4    fLS 0.08404753  0.7457328
5    KSU 0.11173531  0.9453197
6   MqCC 0.15671892  1.3708786
7   BEPU 0.00000000  0.0000000
8   SSEX 0.06176727  0.5594089
9    QQQ 0.14262612  1.2891116
10   AAA 0.14857222  1.3059176

Upvotes: 2

Pierre L

Reputation: 28441

We can also use sweep. It is a useful function for any case where a vector of values must be applied over an array. We choose margin 2 to sweep over the columns (we could have chosen 1 for rows):

sweep(df[-1], 2, df2$totals, '/')
#      2015-08-24 2015-08-25
# [1,] 0.10292332  0.8733696
# [2,] 0.11688008  1.0000000
# [3,] 0.07472895  0.6478159
# [4,] 0.08404753  0.7457328
# [5,] 0.11173531  0.9453197
# [6,] 0.15671892  1.3708786
# [7,] 0.00000000  0.0000000
# [8,] 0.06176727  0.5594089
# [9,] 0.14262612  1.2891116
# [10,] 0.14857222  1.3059176

Upvotes: 5

Looping through Columns of Dataframe and dividing values by a value indexed by column name in another dataframe?

Answers (3)

Related Questions