user11418708
user11418708

Reputation: 902

Means based on column

I would like to calculate the mean of the below data frame based on the index. My data has the following form.

  index  t1  t2  t3  t4
    10   1    4   7   10
    20   2    5   8   11
    30   3    6   9   0
    40   1    0   0   0

I defined the following steps to calculate the means of index column:

Step 1: Sum columns based on time-steps:

 index  t1   t2  t3  t4  sum
    10  1    4   7   10  22
    20  2    5   8   11  26
    30  3    6   9   0   18
    40  0    1   0   0   1

Step 2: Following the sum variable create a data frame to calculate the means of index

    10 20 30 40 
    10 20 30 
    10 20 30
    10 20 30
    10 20 30
    10 20 30
    10 20 30
    10 20 30
    10 20 30
    10 20 30
    10 20 30
    10 20 30
    10 20 30
    10 20 30
    10 20 30
    10 20 30
    10 20 30
    10 20 30
    10 20
    10 20
    10 20
    10 20
       20
       20
       20
       20

I don't know how to create the df at step 2 (eg. how to return the number of occurrences based on sum value)


Data sample:

df<-structure(list(index=c (10,20,30,40), 
                   t1 = c(1, 2, 3, 1), 
                   t2 = c(4, 5, 6, 0), 
                   t3 = c(7, 8,9,  0),
                   t4 = c(10, 11, 0, 0)), row.names = c(NA,4L), class = "data.frame")

Upvotes: 0

Views: 47

Answers (2)

Edo
Edo

Reputation: 7818

A solution with base R that returns a Dataframe at the end.

x <- rep(df$index, rowSums(df[-1]))
x <- split(x,x)
x <- lapply(x, `length<-`, max(lengths(x)))
data.frame(x)
#>    X10 X20 X30 X40
#> 1   10  20  30  40
#> 2   10  20  30  NA
#> 3   10  20  30  NA
#> 4   10  20  30  NA
#> 5   10  20  30  NA
#> 6   10  20  30  NA
#> 7   10  20  30  NA
#> 8   10  20  30  NA
#> 9   10  20  30  NA
#> 10  10  20  30  NA
#> 11  10  20  30  NA
#> 12  10  20  30  NA
#> 13  10  20  30  NA
#> 14  10  20  30  NA
#> 15  10  20  30  NA
#> 16  10  20  30  NA
#> 17  10  20  30  NA
#> 18  10  20  30  NA
#> 19  10  20  NA  NA
#> 20  10  20  NA  NA
#> 21  10  20  NA  NA
#> 22  10  20  NA  NA
#> 23  NA  20  NA  NA
#> 24  NA  20  NA  NA
#> 25  NA  20  NA  NA
#> 26  NA  20  NA  NA

Upvotes: 0

Ronak Shah
Ronak Shah

Reputation: 388982

You can take sum of rows using rowSums and then repeat index value sum times.

df$sum <- rowSums(df[-1])
lst <- Map(rep, df$index, df$sum)
sapply(lst, `[`, 1:max(lengths(lst)))

#      [,1] [,2] [,3] [,4]
# [1,]   10   20   30   40
# [2,]   10   20   30   NA
# [3,]   10   20   30   NA
# [4,]   10   20   30   NA
# [5,]   10   20   30   NA
# [6,]   10   20   30   NA
# [7,]   10   20   30   NA
# [8,]   10   20   30   NA
# [9,]   10   20   30   NA
#[10,]   10   20   30   NA
#[11,]   10   20   30   NA
#[12,]   10   20   30   NA
#[13,]   10   20   30   NA
#[14,]   10   20   30   NA
#[15,]   10   20   30   NA
#[16,]   10   20   30   NA
#[17,]   10   20   30   NA
#[18,]   10   20   30   NA
#[19,]   10   20   NA   NA
#[20,]   10   20   NA   NA
#[21,]   10   20   NA   NA
#[22,]   10   20   NA   NA
#[23,]   NA   20   NA   NA
#[24,]   NA   20   NA   NA
#[25,]   NA   20   NA   NA
#[26,]   NA   20   NA   NA

Upvotes: 2

Related Questions