Reputation: 123
I am trying to replace a character in my dataframe with a character from another data frame.
So, for example:
df1 looks like this:
Sample aac.2...Ia aac.3..I aac.3..Ia aac.3..Id
1 TG02036 - - - -
2 TG03227 - - - -
3 TG04597 - - - -
4 TG04623 - - - -
5 TG04629 - - - -
I want to replace the matching rows for "Sample" in df1 with "Isolate.Barcode" in df2, which looks like this:
Isolate.Barcode Sample aac.2...Ia aac.3..I aac.3..Ia
1 TG2035 TG02036 - - -
2 TG1817 TG03227 - - -
3 TG1818 TG04597 - - -
4 TG1821 TG04623 - - -
5 TG1820 TG04629 - - -
I'm trying to do this using the DataCombine package with the following code:
df1_corrected <- FindReplace(df1, Var = "Sample", df2,
from = df2$Sample,
to = df2$Isolate.Barcode, exact = TRUE)
I get the following warnings:
Warning messages:
1: In gsub(pattern = paste0("^", replaceData[i, from], "$"), ... :
argument 'pattern' has length > 1 and only the first element will be used
Also, the replacement does not happen.
Thanks for any help you can provide!!
Upvotes: 1
Views: 85
Reputation: 206243
As far as getting FindReplace
to work, your mistake was that the from/to parameters need to be character vectors of names of columns from your replaceData data.frame. So this appears to work
FindReplace(df1, "Sample", df2,"Sample","Isolate.Barcode",exact=F)
Upvotes: 0
Reputation: 56915
I'd use match
here which will return indices of matches of one vector in the other ?match
.
First, a reproducible example is always good (i.e. a small example we can copy/paste into R to try things out)::
df1 <- data.frame(Sample=letters[1:5], value=1:5, stringsAsFactors=F)
df2 <- data.frame(newID=LETTERS[c(1,3,5,6:10)], Sample=letters[c(1,3,5,6:10)], stringsAsFactors=F)
> df1
Sample value
1 a 1
2 b 2
3 c 3
4 d 4
5 e 5
> df2
newID Sample
1 A a
2 C c
3 E e
4 F f
5 G g
6 H h
7 I i
8 J j
So here we would expect the Sample column in the final df1
to be A, b, C, d, E
(the only matches in df2
).
First, have a look at
match(df1$Sample, df2$Sample)
[1] 1 NA 2 NA 3
For each row in df1
it returns the matching index in the Sample column of df2
.
So:
idx <- match(df1$Sample, df2$Sample) df1$Sample[!is.na(idx)] <- df2$newID[idx[!is.na(idx)]]
> df1
Sample value
1 A 1
2 b 2
3 C 3
4 d 4
5 E 5
So as expected we replaced a, c, and e with the newID
column of df2, being A, C and E.
Upvotes: 1