Steve
Steve

Reputation: 598

R multiply unequal dataframes based on specific column values

Updated question to include extra columns that pass from df1 to output, as well as add that the main df has 7 million rows.


I have two dataframes, similar to Pandas SO Question

And I need to multiply them where the names match. I'm not sure how to accomplish this cleanly. is there an apply function that does this easily?

DF1: (there are 16 columns of extra data, 7 million rows long)

Data1   Data2   Name      Value
aa      bb      sample1   50
ff      ff      sample1   100
ef      fd      sample1   75
ff      df      sample2   100
bbf     ad3     sample2   200
dd      a       sample2   300
33      3rf     sample3   25
ddd     dd      sample3   50
dd      dd      sample3   40

DF2:

Name      Value
sample1   1
sample2   0.5
sample3   2

OUTPUT: (With the additional 16 columns that aren't shown)

Data1   Data2   Name      Value
aa      bb      sample1   50
ff      ff      sample1   100
ef      fd      sample1   75
ff      df      sample2   50
bbf     ad3     sample2   100
dd      a       sample2   150
33      3rf     sample3   50
ddd     dd      sample3   100
dd      dd      sample3   80

Upvotes: 0

Views: 384

Answers (4)

ThomasIsCoding
ThomasIsCoding

Reputation: 102309

You can try the base R code below, using merge

DF1$Value <- do.call(`*`,merge(DF1[c("Name","Value")],DF2,all = TRUE,by="Name")[-1])

such that

> DF1
  Data1 Data2    Name Value
1    aa    bb sample1    50
2    ff    ff sample1   100
3    ef    fd sample1    75
4    ff    df sample2    50
5   bbf   ad3 sample2   100
6    dd     a sample2   150
7    33   3rf sample3    50
8   ddd    dd sample3   100
9    dd    dd sample3    80

DATA

DF1 <- structure(list(Data1 = c("aa", "ff", "ef", "ff", "bbf", "dd", 
"33", "ddd", "dd"), Data2 = c("bb", "ff", "fd", "df", "ad3", 
"a", "3rf", "dd", "dd"), Name = c("sample1", "sample1", "sample1", 
"sample2", "sample2", "sample2", "sample3", "sample3", "sample3"
), Value = c(50L, 100L, 75L, 100L, 200L, 300L, 25L, 50L, 40L)), class = "data.frame", row.names = c(NA, 
-9L))

DF2 <- structure(list(Name = c("sample1", "sample2", "sample3"), Value = c(1, 
0.5, 2)), class = "data.frame", row.names = c(NA, -3L))

Upvotes: 1

B. Christian Kamgang
B. Christian Kamgang

Reputation: 6519

You can use data.table package:

library(data.table)
setDT(df1)[setDT(df2), Value_new := Value * i.Value, on = "Name"]

#     Data1  Data2    Name Value Value_new
# 1:     aa     bb sample1    50        50
# 2:     ff     ff sample1   100       100
# 3:     ef     fd sample1    75        75
# 4:     ff     df sample2   100        50
# 5:    bbf    ad3 sample2   200       100
# 6:     dd      a sample2   300       150
# 7:     33    3rf sample3    25        50
# 8:    ddd     dd sample3    50       100
# 9:     dd     dd sample3    40        80

Upvotes: 1

Ronak Shah
Ronak Shah

Reputation: 389155

We can join both the dataframes using a left_join or an inner_join and then multiply the corresponding Value columns with one another. Using dplyr that can be done as :

library(dplyr)

inner_join(df1, df2, by = 'Name') %>%
   mutate(Value = Value.x * Value.y) %>%
   select(names(df1))

#     Name Value
#1 sample1    50
#2 sample1   100
#3 sample1    75
#4 sample2    50
#5 sample2   100
#6 sample2   150
#7 sample3    50
#8 sample3   100
#9 sample3    80

data

df1 <- structure(list(Name = structure(c(1L, 1L, 1L, 2L, 2L, 2L, 3L, 
3L, 3L), .Label = c("sample1", "sample2", "sample3"), class = "factor"), 
Value = c(50L, 100L, 75L, 100L, 200L, 300L, 25L, 50L, 40L
)), class = "data.frame", row.names = c(NA, -9L))

df2 <- structure(list(Name = structure(1:3, .Label = c("sample1", "sample2", 
"sample3"), class = "factor"), Value = c(1, 0.5, 2)), class = "data.frame", 
row.names = c(NA, -3L))

Upvotes: 2

d.b
d.b

Reputation: 32548

Most straightforward would be to use match to get the row indices of df2 in proper order.

df2$Value[match(df1$Name, df2$Name)] * df1$Value

You could also convert df2 to vector with names based on Name column. Then subset values from it using the Name column of df1.

df1$Value * setNames(df2$Value, df2$Name)[df1$Name]

Upvotes: 2

Related Questions