MOMO
MOMO

Reputation: 57

R: creating a new column in a data.frame based on values out of a second data.frame

I have two (example) data.frames (df1, df2)

#df1
L <- LETTERS[1:4]
b <- sample(L, 20, replace = TRUE)
df1 <- data.frame(stuff1 = 1, stuff2 = 1:10, b = b, c= NA, stringsAsFactors=FALSE)

#df2
a <- c(10,20,30,40)
df2 <- data.frame(xx = L, yy = a, stringsAsFactors=FALSE )

i want to have a new column, let's say c, in df1 based on the values out of df2. One example: A has the corresponding value of 10 (see df2) so for every A in column b of df1 should be 10 written down in the (new) line c. And this for every row of xx in df2, so in this case A,B,C and D. My code is not really working and is based only for a single value, here A:

##copy column b now it is c
df1["c"] <- df1$b


# which value has A in df2?
zz <- df2[df2$xx == "A",]
xy <- zz$yy


# fill in the new value in c (this is not working)
df1[df1$c == "A", ] <- xy 

i hope it is clear what i want to say... oh and i have some big data this is only an example to try out...

Upvotes: 0

Views: 2383

Answers (1)

andybega
andybega

Reputation: 1437

It sounds like you just want to do a merge/join. First, let's drop the empty c in df1 and change the column names a bit:

 df1 <- df1[, !names(df1) %in% "c"]
 colnames(df2) <- c("b", "c")

With just base R, we can use merge:

 df3 <- merge(df1, df2, by="b", type="left")
 head(df3)

  b stuff1 stuff2  c
1 A      1      1 10
2 A      1      2 10
3 A      1      3 10
4 A      1      3 10
5 A      1     10 10
6 A      1      7 10

The package plyr has an alternative that might be faster and will preserve column order:

library(plyr)
df4 <- join(df1, df2, by="b", type="left")
head(df4)

  stuff1 stuff2 b  c
1      1      1 A 10
2      1      2 A 10
3      1      3 A 10
4      1      4 B 20
5      1      5 B 20
6      1      6 B 20

I don't know how well that will scale with the size of your data, but if it doesn't, you could try data.table or sqldf. I haven't used these two so I can't speak much to them, but here's a comparison of their speed that might be a good starting point.

Upvotes: 1

Related Questions