micro_gnomics
micro_gnomics

Reputation: 123

replacing id characters in one data frame with those in another data frame

I am trying to replace a character in my dataframe with a character from another data frame.

So, for example:

df1 looks like this:

   Sample aac.2...Ia aac.3..I aac.3..Ia aac.3..Id
1 TG02036          -        -         -         -
2 TG03227          -        -         -         -
3 TG04597          -        -         -         -
4 TG04623          -        -         -         -
5 TG04629          -        -         -         -

I want to replace the matching rows for "Sample" in df1 with "Isolate.Barcode" in df2, which looks like this:

  Isolate.Barcode  Sample aac.2...Ia aac.3..I aac.3..Ia
1          TG2035 TG02036          -        -         -
2          TG1817 TG03227          -        -         -
3          TG1818 TG04597          -        -         -
4          TG1821 TG04623          -        -         -
5          TG1820 TG04629          -        -         -

I'm trying to do this using the DataCombine package with the following code:

df1_corrected <- FindReplace(df1, Var = "Sample", df2, 
                               from = df2$Sample, 
            to = df2$Isolate.Barcode, exact = TRUE)

I get the following warnings:

Warning messages:
1: In gsub(pattern = paste0("^", replaceData[i, from], "$"),  ... :
  argument 'pattern' has length > 1 and only the first element will be used

Also, the replacement does not happen.

Thanks for any help you can provide!!

Upvotes: 1

Views: 85

Answers (2)

MrFlick
MrFlick

Reputation: 206243

As far as getting FindReplace to work, your mistake was that the from/to parameters need to be character vectors of names of columns from your replaceData data.frame. So this appears to work

FindReplace(df1, "Sample", df2,"Sample","Isolate.Barcode",exact=F)

Upvotes: 0

mathematical.coffee
mathematical.coffee

Reputation: 56915

I'd use match here which will return indices of matches of one vector in the other ?match.

First, a reproducible example is always good (i.e. a small example we can copy/paste into R to try things out)::

df1 <- data.frame(Sample=letters[1:5], value=1:5, stringsAsFactors=F)
df2 <- data.frame(newID=LETTERS[c(1,3,5,6:10)], Sample=letters[c(1,3,5,6:10)], stringsAsFactors=F)
> df1
  Sample value
1      a     1
2      b     2
3      c     3
4      d     4
5      e     5
> df2
  newID Sample
1     A      a
2     C      c
3     E      e
4     F      f
5     G      g
6     H      h
7     I      i
8     J      j

So here we would expect the Sample column in the final df1 to be A, b, C, d, E (the only matches in df2).

First, have a look at

match(df1$Sample, df2$Sample)
[1]  1 NA  2 NA  3

For each row in df1 it returns the matching index in the Sample column of df2. So:

idx <- match(df1$Sample, df2$Sample) df1$Sample[!is.na(idx)] <- df2$newID[idx[!is.na(idx)]]

> df1
  Sample value
1      A     1
2      b     2
3      C     3
4      d     4
5      E     5

So as expected we replaced a, c, and e with the newID column of df2, being A, C and E.

Upvotes: 1

Related Questions