crumbly
crumbly

Reputation: 79

how to sum several rows in dataframe

I have the following data frame u

   u<-data.frame(a1=c(0.1,0.2,0.4),a2=c(0.5,0.4,0.8),a3=c(0.4,0.6,0.7),a4=c(0.1,0.4,0.6))

df          
a1  a2  a3  a4
0.1 0.5 0.4 0.1
0.2 0.4 0.6 0.4
0.4 0.8 0.7 0.6

I am trying to create a new data frame in which the row sum doesn't exceed 1. So for first row sum is 1 in a3 so a4 will be set to zero.In second row sum becomes 1.2 in column 3 so a3 will be set to 0.4 and a4 to zero to make sure the sum of row doesn't exceed 1. The resulting data frame u

df          
a1  a2  a3  a4
0.1 0.5 0.4 0
0.2 0.4 0.4 0
0.4 0.6 0   0

Upvotes: 3

Views: 116

Answers (1)

Batanichek
Batanichek

Reputation: 7871

If you have only positive number in df you can do something like this

u<-data.frame(a1=c(0.1,0.2,0.4),a2=c(0.5,0.4,0.8),a3=c(0.4,0.6,0.7),a4=c(0.1,0.4,0.6))
z=t(apply(u,1,cumsum))-1 # difference between 1 and cumsum
z[z<0]=0
u2=u-z
u2[u2<0]=0
u2


  a1  a2  a3 a4
1 0.1 0.5 0.4  0
2 0.2 0.4 0.4  0
3 0.4 0.6 0.0  0

Or pmax using ( a bit shorter )

u<-data.frame(a1=c(0.1,0.2,0.4),a2=c(0.5,0.4,0.8),a3=c(0.4,0.6,0.7),a4=c(0.1,0.4,0.6))
z=pmax(t(apply(u,1,cumsum))-1,0) # positive difference between 1 and cumsum
u2=pmax(as.matrix(u-z),0)
u2

or using matrixStats library

u2=as.matrix(u)
pmax(u2-pmax(rowCumsums(u2)-1,0),0)

The last one is the fastest of my variants

Unit: microseconds
 expr     min      lq     mean   median      uq      max neval
 f1() 804.139 829.798 909.1229 861.2580 889.818 4150.103   100
 f2() 764.422 789.635 874.3958 808.8240 848.763 3832.822   100
 f3()  96.390 110.669 126.7079 119.5955 131.420  253.469   100

Upvotes: 1

Related Questions