Reputation: 79
I have the following data frame u
u<-data.frame(a1=c(0.1,0.2,0.4),a2=c(0.5,0.4,0.8),a3=c(0.4,0.6,0.7),a4=c(0.1,0.4,0.6))
df
a1 a2 a3 a4
0.1 0.5 0.4 0.1
0.2 0.4 0.6 0.4
0.4 0.8 0.7 0.6
I am trying to create a new data frame in which the row sum doesn't exceed 1. So for first row sum is 1 in a3 so a4 will be set to zero.In second row sum becomes 1.2 in column 3 so a3 will be set to 0.4 and a4 to zero to make sure the sum of row doesn't exceed 1. The resulting data frame u
df
a1 a2 a3 a4
0.1 0.5 0.4 0
0.2 0.4 0.4 0
0.4 0.6 0 0
Upvotes: 3
Views: 116
Reputation: 7871
If you have only positive number in df you can do something like this
u<-data.frame(a1=c(0.1,0.2,0.4),a2=c(0.5,0.4,0.8),a3=c(0.4,0.6,0.7),a4=c(0.1,0.4,0.6))
z=t(apply(u,1,cumsum))-1 # difference between 1 and cumsum
z[z<0]=0
u2=u-z
u2[u2<0]=0
u2
a1 a2 a3 a4
1 0.1 0.5 0.4 0
2 0.2 0.4 0.4 0
3 0.4 0.6 0.0 0
Or pmax using ( a bit shorter )
u<-data.frame(a1=c(0.1,0.2,0.4),a2=c(0.5,0.4,0.8),a3=c(0.4,0.6,0.7),a4=c(0.1,0.4,0.6))
z=pmax(t(apply(u,1,cumsum))-1,0) # positive difference between 1 and cumsum
u2=pmax(as.matrix(u-z),0)
u2
or using matrixStats
library
u2=as.matrix(u)
pmax(u2-pmax(rowCumsums(u2)-1,0),0)
The last one is the fastest of my variants
Unit: microseconds
expr min lq mean median uq max neval
f1() 804.139 829.798 909.1229 861.2580 889.818 4150.103 100
f2() 764.422 789.635 874.3958 808.8240 848.763 3832.822 100
f3() 96.390 110.669 126.7079 119.5955 131.420 253.469 100
Upvotes: 1