zola25
zola25

Reputation: 1911

Vector addition with vector indexing

This may well have an answer elsewhere but I'm having trouble formulating the words of the question to find what I need.

I have two dataframes, A and B, with A having many more rows than B. I want to look up a value from B based on a column of A, and add it to another column of A. Something like:

A$ColumnToAdd + B[ColumnToMatch == A$ColumnToMatch,]$ColumnToAdd

But I get, with a load of NAs:

Warning in `==.default`: longer object length is not a multiple of shorter object length

I could do it with a messy for-loop but I'm looking for something faster & elegant.

Thanks

Upvotes: 0

Views: 59

Answers (1)

7hibault
7hibault

Reputation: 2459

If I understood your question correctly, you're looking for a merge or a join, as suggested in the comments.

Here's a simple example for both using dummy data that should fit what you described.

library(tidyverse)

# Some dummy data
ColumnToAdd <- c(1,1,1,1,1,1,1,1)
ColumnToMatch  <- c('a','b','b','b','c','a','c','d')
A <- data.frame(ColumnToAdd, ColumnToMatch)
ColumnToAdd <- c(1,2,3,4)
ColumnToMatch <- c('a','b','c','d')
B <- data.frame(ColumnToAdd, ColumnToMatch)

# Example using merge
A %>% 
  merge(B, by = c("ColumnToMatch")) %>%  
  mutate(sum = ColumnToAdd.x + ColumnToAdd.y)                    

# Example using join
A %>% 
  inner_join(B, by = c("ColumnToMatch")) %>% 
  mutate(sum = ColumnToAdd.x + ColumnToAdd.y)    

The advantages of the dplyr versions over merge are:

  • rows are kept in existing order
  • much faster
  • tells you what keys you're merging by (if you don't supply)
  • also work with database tables.

Upvotes: 1

Related Questions