Reputation: 598
Updated question to include extra columns that pass from df1 to output, as well as add that the main df has 7 million rows.
I have two dataframes, similar to Pandas SO Question
And I need to multiply them where the names match. I'm not sure how to accomplish this cleanly. is there an apply function that does this easily?
DF1: (there are 16 columns of extra data, 7 million rows long)
Data1 Data2 Name Value
aa bb sample1 50
ff ff sample1 100
ef fd sample1 75
ff df sample2 100
bbf ad3 sample2 200
dd a sample2 300
33 3rf sample3 25
ddd dd sample3 50
dd dd sample3 40
DF2:
Name Value
sample1 1
sample2 0.5
sample3 2
OUTPUT: (With the additional 16 columns that aren't shown)
Data1 Data2 Name Value
aa bb sample1 50
ff ff sample1 100
ef fd sample1 75
ff df sample2 50
bbf ad3 sample2 100
dd a sample2 150
33 3rf sample3 50
ddd dd sample3 100
dd dd sample3 80
Upvotes: 0
Views: 384
Reputation: 102309
You can try the base R code below, using merge
DF1$Value <- do.call(`*`,merge(DF1[c("Name","Value")],DF2,all = TRUE,by="Name")[-1])
such that
> DF1
Data1 Data2 Name Value
1 aa bb sample1 50
2 ff ff sample1 100
3 ef fd sample1 75
4 ff df sample2 50
5 bbf ad3 sample2 100
6 dd a sample2 150
7 33 3rf sample3 50
8 ddd dd sample3 100
9 dd dd sample3 80
DATA
DF1 <- structure(list(Data1 = c("aa", "ff", "ef", "ff", "bbf", "dd",
"33", "ddd", "dd"), Data2 = c("bb", "ff", "fd", "df", "ad3",
"a", "3rf", "dd", "dd"), Name = c("sample1", "sample1", "sample1",
"sample2", "sample2", "sample2", "sample3", "sample3", "sample3"
), Value = c(50L, 100L, 75L, 100L, 200L, 300L, 25L, 50L, 40L)), class = "data.frame", row.names = c(NA,
-9L))
DF2 <- structure(list(Name = c("sample1", "sample2", "sample3"), Value = c(1,
0.5, 2)), class = "data.frame", row.names = c(NA, -3L))
Upvotes: 1
Reputation: 6519
You can use data.table
package:
library(data.table)
setDT(df1)[setDT(df2), Value_new := Value * i.Value, on = "Name"]
# Data1 Data2 Name Value Value_new
# 1: aa bb sample1 50 50
# 2: ff ff sample1 100 100
# 3: ef fd sample1 75 75
# 4: ff df sample2 100 50
# 5: bbf ad3 sample2 200 100
# 6: dd a sample2 300 150
# 7: 33 3rf sample3 25 50
# 8: ddd dd sample3 50 100
# 9: dd dd sample3 40 80
Upvotes: 1
Reputation: 389155
We can join both the dataframes using a left_join
or an inner_join
and then multiply the corresponding Value
columns with one another. Using dplyr
that can be done as :
library(dplyr)
inner_join(df1, df2, by = 'Name') %>%
mutate(Value = Value.x * Value.y) %>%
select(names(df1))
# Name Value
#1 sample1 50
#2 sample1 100
#3 sample1 75
#4 sample2 50
#5 sample2 100
#6 sample2 150
#7 sample3 50
#8 sample3 100
#9 sample3 80
data
df1 <- structure(list(Name = structure(c(1L, 1L, 1L, 2L, 2L, 2L, 3L,
3L, 3L), .Label = c("sample1", "sample2", "sample3"), class = "factor"),
Value = c(50L, 100L, 75L, 100L, 200L, 300L, 25L, 50L, 40L
)), class = "data.frame", row.names = c(NA, -9L))
df2 <- structure(list(Name = structure(1:3, .Label = c("sample1", "sample2",
"sample3"), class = "factor"), Value = c(1, 0.5, 2)), class = "data.frame",
row.names = c(NA, -3L))
Upvotes: 2
Reputation: 32548
Most straightforward would be to use match
to get the row indices of df2
in proper order.
df2$Value[match(df1$Name, df2$Name)] * df1$Value
You could also convert df2
to vector with names based on Name
column. Then subset values from it using the Name
column of df1
.
df1$Value * setNames(df2$Value, df2$Name)[df1$Name]
Upvotes: 2