Reputation: 311
I have a dataframe (in reality I have 170 columns (85 pairs) and ~8000 rows)
data <- data.frame(A = c(6,5,4,3), B = c(2,2,2,2), C = c(9,8,7,6), D = c(2,2,2,2))
I would like to subtract column 2 from column 1, column 4 from column 3, etc. for all rows.
I think I need to either try to write a function or use apply in some way.
Upvotes: 2
Views: 1023
Reputation: 93813
R has vectorised operations to deal with this kind of task in a single call:
data[c(1,3)] - data[c(2,4)]
## or for every column until the end of the dataset
data[seq(1,ncol(data),2)] - data[seq(2,ncol(data),2)]
# A C
#1 4 7
#2 3 6
#3 2 5
#4 1 4
See this previous discussion for lots of useful advice - Selecting multiple odd or even columns/rows for dataframe
You can extend this so the naming is done automatically:
s <- seq(1,ncol(data),2)
data[paste0(names(data[s]), "minus", names(data)[-s])] <- data[s] - data[-s]
data
# A B C D AminusB CminusD
#1 6 2 9 2 4 7
#2 5 2 8 2 3 6
#3 4 2 7 2 2 5
#4 3 2 6 2 1 4
Upvotes: 6
Reputation: 10223
Many basic operations on data.frame
s are vectorized meaning that addition, subtraction, multiplication, etc, are element wise. I.e. the following works:
data <- data.frame(A = c(6,5,4,3), B = c(2,2,2,2), C = c(9,8,7,6), D = c(2,2,2,2))
data$AminusB <- data$A - data$B
data$CminusD <- data$C - data$D
print(data)
# A B C D AminusB CminusD
#1 6 2 9 2 4 7
#2 5 2 8 2 3 6
#3 4 2 7 2 2 5
#4 3 2 6 2 1 4
You can also access column 4. say, by data[4]
or data[,4]
or data[,"D"]
and more. See help("[")
. Depending on how you want your output, there are many options how to do it. With a simple for-loop you can loop over your columns and make all differences.
Upvotes: 3
Reputation: 28825
Just another approach using apply
:
-t(apply(data, 1, diff))[ , seq(1, ncol(data)-1, by=2)]
# B D
# [1,] 4 7
# [2,] 3 6
# [3,] 2 5
# [4,] 1 4
Upvotes: 2
Reputation: 4863
Having 170 columns, specifying every column name would be daunting. If all of your columns are numeric, you can do this:
#Sample data
set.seed(123)
df <- data.frame(x = floor(rnorm(5, 10, 2)),
y = floor(rnorm(5, 30, 2)),
z = floor(rnorm(5, 50, 2)))
x y z
1 8 33 52
2 9 30 50
3 13 27 50
4 10 28 50
5 10 29 48
Subtracting columns:
df[-1] - df[-ncol(df)]
y z
1 25 19
2 21 20
3 14 23
4 18 22
5 19 19
Upvotes: 2
Reputation: 263332
You can choose every other column with c(TRUE,FALSE) or its negation. The binary-minus has a dataframe method:
data[c(TRUE,FALSE)] - data[c(FALSE,TRUE)]
A C
1 4 7
2 3 6
3 2 5
4 1 4
If you wanted to name then meaningfull you could use paste
on the names:
paste( names(data[c(TRUE,FALSE)]) , "_minus_", names( data[c(FALSE,TRUE)]) )
Upvotes: 4