Reputation: 753
I have two data frame. One data frame has only 1 record and 3 columns. Another data frame has 6 rows and 3 columns. Now I want to subtract data frame 1 values from data frame 2 values.
Sample data:
df1 = structure(list(col1 = 2L, col2 = 3L, col3 = 4L), .Names = c("col1",
"col2", "col3"), class = "data.frame", row.names = c(NA, -1L))
df2 = structure(list(col1 = c(1L, 2L, 4L, 5L, 6L, 3L), col2 = c(1L,
2L, 4L, 3L, 5L, 7L), col3 = c(6L, 4L, 3L, 6L, 4L, 6L)), .Names = c("col1", "col2", "col3"), class = "data.frame", row.names = c(NA, -6L))
Final output should be like,
output = structure(list(col1 = c(-1L, 0L, 2L, 3L, 4L, 1L), col2 = c(-2L,
-1L, 1L, 0L, 2L, 4L), col3 = c(2L, 0L, -1L, 2L, 0L, 2L)), .Names = c("col1","col2", "col3"), class = "data.frame", row.names = c(NA, -6L))
Upvotes: 3
Views: 735
Reputation: 56149
We can use sweep:
x <- sweep(df2, 2, unlist(df1), "-")
#test if same as output
identical(output, x)
# [1] TRUE
Note, it is twice slower than mapply:
df2big <- data.frame(col1 = runif(100000),
col2 = runif(100000),
col3 = runif(100000))
microbenchmark::microbenchmark(
mapply = data.frame(mapply("-", df2big, df1)),
sapply = data.frame(sapply(names(df1), function(i){df2big[[i]] - df1[[i]]})),
sweep = sweep(df2big, 2, unlist(df1), "-"))
# Unit: milliseconds
# expr min lq mean median uq max neval
# mapply 5.239638 7.645213 11.49182 8.514876 9.345765 60.60949 100
# sapply 5.250756 5.518455 10.94827 8.706027 10.091841 59.09909 100
# sweep 10.572785 13.912167 21.18537 14.985525 16.737820 64.90064 100
Upvotes: 2
Reputation: 509
Try this..
# Creating Datasets
df1 = structure(list(col1 = 2L, col2 = 3L, col3 = 4L), .Names = c("col1", "col2", "col3"), class = "data.frame", row.names = c(NA, -1L))
df2 = structure(list(col1 = c(1L, 2L, 4L, 5L, 6L, 3L), col2 = c(1L,2L, 4L, 3L, 5L, 7L), col3 = c(6L, 4L, 3L, 6L, 4L, 6L)), .Names = c("col1", "col2", "col3"), class = "data.frame", row.names = c(NA, -6L))
# Output
data.frame(sapply(names(df1), function(i){df2[[i]] - df1[[i]]}))
# col1 col2 col3
# 1 -1 -2 2
# 2 0 -1 0
# 3 2 1 -1
# 4 3 0 2
# 5 4 2 0
# 6 1 4 2
Upvotes: 4
Reputation: 388907
If you do df2 - df1
directly you get
df2 - df1
Error in Ops.data.frame(df2, df1) : ‘-’ only defined for equally-sized data frames
So let us make df1
the same size as df2
by repeating rows and then subtract
df2 - df1[rep(seq_len(nrow(df1)), nrow(df2)), ]
# col1 col2 col3
#1 -1 -2 2
#2 0 -1 0
#3 2 1 -1
#4 3 0 2
#5 4 2 0
#6 1 4 2
Or another option is using mapply
without replicating rows
mapply("-", df2, df1)
This would return a matrix, if you want a dataframe back
data.frame(mapply("-", df2, df1))
# col1 col2 col3
#1 -1 -2 2
#2 0 -1 0
#3 2 1 -1
#4 3 0 2
#5 4 2 0
#6 1 4 2
Upvotes: 3