Reputation: 75
I have a data.frame df with 5 columns and around 10000 rows.
I try to substract for each pair of consecutive rows the value of column 2 in row(i+1) from the value of column 3 in row(1) and write the result in a new column called 'diff'
the df looks like:
` chr start end TBX21 width
1 chr1 4847746 4847778 53.37334 32
2 chr1 6204636 6204673 33.70947 37
3 chr1 6457267 6457345 31.83673 78
`
I tried: `
length = length(df[[1]])-1
for (i in 1:length) {
df$diff = df[i+1, 2] - df[i,3];
}
` and what i get is:
`chr start end TBX21 width diff
1 chr1 4847746 4847778 53.37334 32 9229
2 chr1 6204636 6204673 33.70947 37 9229
3 chr1 6457267 6457345 31.83673 78 9229
4 chr1 7078778 7078822 39.32772 44 9229`
i can't figuere out my mistake. and yes I'm a beginner in R
Upvotes: 1
Views: 1700
Reputation: 426
The problem is you are assigning the difference to all rows at once, since you forgot to index you diff variable as well.
Replace df$diff
for df$diff[i]
, it should work.
However, explicit looping in R is not always the best option, specially with large data sets. @Andrie's answer cover it pretty well in a vectorized approach. If you have a small/medium-sized dataset, I'd keep it simple as it is easier to read.
Upvotes: 1
Reputation: 179398
You can achieve this in a vectorised way, i.e. without using an explicit loop.
For example:
dat$diff <- c(NA, tail(dat$end, -1) - head(dat$start, -1))
dat
chr start end TBX21 width diff
1 chr1 4847746 4847778 53.37334 32 NA
2 chr1 6204636 6204673 33.70947 37 1356927
3 chr1 6457267 6457345 31.83673 78 252709
In words: drop the first element of end
and the last element of start
, then take the vector difference.
Upvotes: 3