Reputation: 327
Let's say I have a data frame with two columns for now:
df<- data.frame(scores_set1=c(32,45,65,96,45,23,23,14),
scores_set2=c(32,40,60,98,21,23,21,63))
I want to randomly select some rows
selected_indeces<- sample(c(1:8), 4, replace = FALSE)
Now I want to add up the values of selected_indeces
sequentially meaning that for first selected_indeces
I just need the value of that specific row, for the second I want the second row value + the first selected value ... and for the nth index I want sum of all values selected already + the value nth row. So, first need a matrix to put the results in
cumulative_loss<-matrix(rep(NA,8*2),nrow=8,ncol=2)
and then one loop for each column and another for each selected_index
for (s in 1:ncol(df)) #for each column
{
for (i in 1:length(selected_indeces)) #for each randomly selected index
{
if (i==1)
{
cumulative_loss[i,s]<- df[selected_indeces[i],s]
}
if (i > 1)
{
cumulative_loss[i,s]<- df[selected_indeces[i],s] +
df[selected_indeces[i-1],s]
}
}
}
The script works although It might be a naive way for doing such thing but the thing is that if (i=4) is only adds values of 4th and third selection while I want it to add first, second , third and fourth random selection and return it.
Upvotes: 0
Views: 95
Reputation: 34703
Here's a way to do this with data.table
(taking into account your comment on @bgoldst's answer:
library(data.table); setDT(df)
#sample 4 elements of each column (i.e., every element of .SD), then cumsum them
df[ , lapply(.SD, function(x) cumsum(sample(x, 4)))]
If you want to use different indices for each column, I would pre-choose them first:
set.seed(1023)
idx <- lapply(integer(ncol(df)), function(...) sample(nrow(df), 4))
idx
# [[1]] #indices for column 1
# [1] 2 8 6 3
#
# [[2]] #indices for column 2
# [1] 4 8 5 1
Then modify the above slightly:
df[ , lapply( seq_along(.SD), function(jj) cumsum(.SD[[jj]][ idx[[jj]] ]) )]
This is the craziest compendium of brackets/parentheses I've ever written in a functional line of code, so I guess it makes sense to break things down a bit:
seq_along
.SD
to pick out the index number of each column, jj
.SD[[jj]]
selects the j
th column, idx[[jj]]
selects the indices for that column, .SD[jj]][idx[jj]]]
picks the idx[[jj]]
rows of the j
th column; this is equivalent to .SD[idx[jj], jj, with = FALSE]
cumsum
the length(idx[[jj]])
rows we chose for column jj
.Result:
# V1 V2
# 1: 45 98
# 2: 59 161
# 3: 82 182
# 4: 147 214
Upvotes: 2
Reputation: 886938
With dplyr
, if we want to sample
each column separately and do the cumsum
, we can use mutate_each
and then select the first 4 with head
.
library(dplyr)
df %>%
mutate_each(funs(cumsum(sample(.)))) %>%
head(.,4)
If the sample
needs to be for the whole dataset
df %>%
slice(sample(row_number(), 4)) %>%
mutate_each(funs(cumsum))
Upvotes: 0
Reputation: 35314
Conveniently, cumsum()
works on data.frames directly, in which case it runs on each column independently. Thus we can index out the selected rows of df
with an index operation and pass the result directly to cumsum()
to get the required output:
set.seed(0L);
sel <- sample(1:8,4L);
sel;
## [1] 8 2 3 6
df[sel,];
## scores_set1 scores_set2
## 8 14 63
## 2 45 40
## 3 65 60
## 6 23 23
cumsum(df[sel,]);
## scores_set1 scores_set2
## 8 14 63
## 2 59 103
## 3 124 163
## 6 147 186
To select different indexes for each column, we can use apply()
:
set.seed(0L);
apply(df,2L,function(col) cumsum(col[sample(1:8,4L)]));
## scores_set1 scores_set2
## [1,] 14 63
## [2,] 59 103
## [3,] 124 126
## [4,] 147 147
If you want to compute the indexes in advance, it becomes slightly trickier. Here's one way of doing it:
set.seed(0L);
sels <- replicate(2L,sample(1:8,4L)); sels;
## [,1] [,2]
## [1,] 8 8
## [2,] 2 2
## [3,] 3 6
## [4,] 6 5
sapply(seq_len(ncol(df)),function(ci) cumsum(df[[ci]][sels[,ci]]));
## [,1] [,2]
## [1,] 14 63
## [2,] 59 103
## [3,] 124 126
## [4,] 147 147
Upvotes: 3