pidoretroma
pidoretroma

Reputation: 75

difference of values in different columns in consecutive rows

I have a data.frame df with 5 columns and around 10000 rows.

I try to substract for each pair of consecutive rows the value of column 2 in row(i+1) from the value of column 3 in row(1) and write the result in a new column called 'diff'

the df looks like:

`  chr   start     end    TBX21 width 
1 chr1 4847746 4847778 53.37334    32
2 chr1 6204636 6204673 33.70947    37      
3 chr1 6457267 6457345 31.83673    78

`

I tried: `

length = length(df[[1]])-1

for (i in 1:length) {
  df$diff = df[i+1, 2] - df[i,3];
}

` and what i get is:

`chr   start     end    TBX21 width diff
1 chr1 4847746 4847778 53.37334    32      9229
2 chr1 6204636 6204673 33.70947    37      9229
3 chr1 6457267 6457345 31.83673    78      9229
4 chr1 7078778 7078822 39.32772    44      9229`

i can't figuere out my mistake. and yes I'm a beginner in R

Upvotes: 1

Views: 1700

Answers (2)

Bernardo
Bernardo

Reputation: 426

The problem is you are assigning the difference to all rows at once, since you forgot to index you diff variable as well.

Replace df$diff for df$diff[i], it should work.

However, explicit looping in R is not always the best option, specially with large data sets. @Andrie's answer cover it pretty well in a vectorized approach. If you have a small/medium-sized dataset, I'd keep it simple as it is easier to read.

Upvotes: 1

Andrie
Andrie

Reputation: 179398

You can achieve this in a vectorised way, i.e. without using an explicit loop.

For example:

dat$diff <- c(NA, tail(dat$end, -1) - head(dat$start, -1))
dat

   chr   start     end    TBX21 width    diff
1 chr1 4847746 4847778 53.37334    32      NA
2 chr1 6204636 6204673 33.70947    37 1356927
3 chr1 6457267 6457345 31.83673    78  252709

In words: drop the first element of end and the last element of start, then take the vector difference.

Upvotes: 3

Related Questions