user3841581
user3841581

Reputation: 2747

Computing values for R dataFrame cells without using for loops

I have a R dataFrame with the followings:

Serial N         year         current    Average 
   B              10            14          15
   B              10            16          15
   C              12            13          12
   D              40            20          20
   B              11            15          15
   C              12            11          12

I would like to have a new column based on the average for a unique serial number. I would like to have something like :

Serial N         year         current    Average      temp 
   B              10            14          15        (15+12+20)/15
   B              10            16          15        (15+12+20)/15
   C              12            13          12        (15+12+20)/12
   D              40            20          20        (15+12+20)/20
   B              11            15          15        (15+12+20)/15
   C              12            11          12        (15+12+20)/12

temp column is the addition of the average value for each Serial N ( for B,C and D) over the value of the average for that row. How can I computing it without using for loops as rows 1,2 and 5 (Serial N: B) is the same in terms of Average column and temp? I started with this:

for (i in unique(df$Serial_N))
   {
       .........
    }     

but I got stuck as I also need the average for other Serial N. How can I do this?

Upvotes: 0

Views: 55

Answers (3)

Reese
Reese

Reputation: 88

Using unique.data.frame() can avoid repeat in Average between different groups

df$temp <- sum((unique.data.frame(df[c("Serial_N","Average")]))$Average) / df$Average

Upvotes: 3

alistaire
alistaire

Reputation: 43344

In base R, you can use either

df <- transform(df, temp = sum(tapply(df$Average, df$Serial_N, unique))/df$Average)

or

df$temp <- sum(tapply(df$Average, df$Serial_N, unique))/df$Average

both of which will give you

df
#   Serial_N year current Average     temp
# 1        B   10      14      15 3.133333
# 2        B   10      16      15 3.133333
# 3        C   12      13      12 3.916667
# 4        D   40      20      20 2.350000
# 5        B   11      15      15 3.133333
# 6        C   12      11      12 3.916667

tapply splits df$Average by the levels of df$Serial_N, and then calls unique on them, which gives you a single average for each group, which you can then sum and divide. transform adds a column (equivalent to dplyr::mutate).

Upvotes: 1

Gopala
Gopala

Reputation: 10483

For example, you can try something like the following (assuming your computation matches):

df$temp <- sum(tapply(df$Average, df$SerialN, mean)) / df$Average

Resulting output:

  SerialN year current Average     temp
1       B   10      14      15 3.133333
2       B   10      16      15 3.133333
3       C   12      13      12 3.916667
4       D   40      20      20 2.350000
5       B   11      15      15 3.133333
6       C   12      11      12 3.916667

Upvotes: 3

Related Questions