Reputation: 625
I have two data frames in R which look like these.
> df
Var1 Freq
1 0.01 1
2 1 27
3 100 27
4 1000 488
5 2000 4
6 5000 7
7 50000 7
8 100000 1
> return_matrix
Var1 Freq
1 0.00001 0
2 0.01 0
3 1 0
4 8 0
5 100 0
6 200 0
7 500 0
8 1000 0
9 2000 0
10 5000 0
11 10000 0
12 50000 0
13 100000 0
14 200000 0
15 500000 0
16 1000000 0
17 10000000 0
18 100000000 0
19 1000000000 0
I wanted to do like a Vlookup kind of stuff of the above data frames so that the output will look like below:
> combined
Var1 Freq
1 0.00001 0
2 0.01 1
3 1 27
4 8 0
5 100 27
6 200 0
7 500 0
8 1000 488
9 2000 4
10 5000 7
11 10000 0
12 50000 7
13 100000 1
14 200000 0
15 500000 0
16 1000000 0
17 10000000 0
18 100000000 0
I tried to perform a merge in R using the below mentioned R code. However i am not getting the expected results like above. Could you please help me on this?
> combined_matrix <- merge(return_matrix, df, by = "Var1" )
> combined_matrix
Var1 Freq.x Freq.y
1 0.01 0 1
2 1 0 27
3 100 0 27
4 1000 0 488
5 100000 0 1
6 2000 0 4
7 5000 0 7
8 50000 0 7
Upvotes: 1
Views: 599
Reputation: 108603
There's no need to merge anything here.
id <- match(return_matrix$Var1, df$Var1, nomatch = 0L)
return_matrix$Freq[id != 0] <- df$Freq[id]
does the trick.
This code will outperform any merge
or plyr
solution by manifold. It's about 100 times faster than ddply
and about 10 times faster than merge
in the following simulation on my machine:
library(plyr)
library(rbenchmark)
df <- data.frame(
Var1 = c(0.01, 1, 100, 1000, 2000, 5000, 50000,100000),
Freq = c(1,27,27,488,4,7,7,1)
)
return_matrix <- data.frame(
Var1 = c(0.00001,0.01,1,8,100,200,500,1000,2000,5000,
10000,50000,100000,200000,500000,1e6,1e7,1e8),
Freq = 0
)
codeJM <- function(df, return_matrix){
id <- match(return_matrix$Var1, df$Var1, nomatch = 0L)
return_matrix$Freq[id != 0] <- df$Var1[id]
return_matrix
}
codemerge <- function(df, return_matrix){
combined_matrix <- merge(return_matrix, df, by = "Var1" , all = TRUE)
combined_matrix$Freq <- combined_matrix$Freq.x+combined_matrix$Freq.y
combined_matrix$Freq.x <- combined_matrix$Freq.y <- NULL
combined_matrix
}
codeddply <- function(df, return_matrix){
full <- rbind(df,return_matrix)
combined <- ddply(full ,.(Var1),function(x)
data.frame(Var1=x$Var1[1],Freq=sum(x$Freq)))
combined
}
benchmark(
codemerge(df, return_matrix),
codeJM(df, return_matrix),
codeddply(df, return_matrix),
replications = 1000
)
gives :
test replications elapsed relative user.self
3 codeddply(df, return_matrix) 1000 5.38 107.6 5.37
2 codeJM(df, return_matrix) 1000 0.05 1.0 0.05
1 codemerge(df, return_matrix) 1000 0.51 10.2 0.52
Upvotes: 2
Reputation: 479
You can still use merge
by selecting all.x = TRUE
, this keeps all rows:
c<-merge(return_matrix, df, by = "Var1", all.x = TRUE)
This will create a second freq column, but you can clean it fairly easily
c<-c[, -2]
c[,2][which(is.na(c[,2]))]<- 0
Upvotes: 1
Reputation: 402
You can either work with the match function Joris suggests or use the ddply function from the plyr package:
library(plyr)
full <- rbind(df,return_matrix)
combined <- ddply(full ,.(Var1),function(x) data.frame(Var1=x$Var1[1],Freq=sum(x$Freq)))
This will sum the values from Freq even if they are not 0 in return_matrix
Upvotes: 1