smack
smack

Reputation: 259

get values from data frame to another

I'm new to R programming and just started to learn it, and i need you to help me with this issue please.

I have 2 data frames :

the first(df1):

    V1 V2
    A  A 
    A  B 
    A  C 
    B  A 
    B  B 
    B  C 

etc

the second(df2) :

V1  Va   Vb
A   12   23
B   15   53
C   321  543
D   54   325
etc..

Use this code to generate the sample data.

df1 <- data.frame(
  V1 = rep(LETTERS[1:2], each = 3L),
  V2 = rep.int(LETTERS[1:3], 2L)
)
dfr2 <- data.frame(
  Va = c(12, 15, 312, 54),
  Vb = c(23, 53, 543, 325)
)

I need to take the Va and Vb from df2 and place them in df1 based on V1 and V2 of df1.

so I want this output:

df3:

V1   V2  Va1  Vb1 Va2 Vb2
A    A   12    23  12  23
A    B   12    23  15  23
A    C   12    23  321 543
B    A   15    23  12   23
B    B   15    23  15   23
B    C   15    23  321  543

hope that this can be done in R without a dozenz of for loops :S.

Upvotes: 1

Views: 3534

Answers (3)

BarkleyBG
BarkleyBG

Reputation: 664

I think Hadley Wickham's dplyr package has some great tools for combining dataframes. It's a different way to arrive at the same place.

df1 <- data.frame(
  V1 = rep(LETTERS[1:2], each = 3L),
  V2 = rep.int(LETTERS[1:3], 2L)
)
dfr2 <- data.frame(
  V1 = LETTERS[1:4],
  Va = c(12, 15, 312, 54),
  Vb = c(23, 53, 543, 325)
)

##necessary libraries
library(magrittr, dplyr)

Now, using the left_join() and rename() functions from the dplyr package, and the pipe operator from the magrittr package, I think you can save a lot of keystrokes.

df3 <- df1 %>% #pipe operator (%>%) is from magrittr package
  dplyr::left_join(dfr2, c('V1'="V1")) %>% #merge once
  dplyr::rename(Va1 = Va, Vb1 = Vb) %>%  #rename columns
  dplyr::left_join(dfr2, c('V2'="V1")) %>% #merge on different column
  dplyr::rename(Va2 = Va, Vb2 = Vb) #rename again

This can also be done with one merge instead of two, but I like to see what's going on in each step.

> print(df3)
  V1 V2 Va1 Vb1 Va2 Vb2
1  A  A  12  23  12  23
2  A  B  12  23  15  53
3  A  C  12  23 312 543
4  B  A  15  53  12  23
5  B  B  15  53  15  53
6  B  C  15  53 312 543

Upvotes: 1

Joris Meys
Joris Meys

Reputation: 108543

You can use the power of match() for this, provided the names in df1$V1 are unique :

#some data
df1 <- data.frame(
  V1 = rep(c("A","B"),each=3),
  V2 = rep(LETTERS[1:3],2)  
)
df2 <- data.frame(V1=LETTERS[1:3],Va=1:3,Vb=3:1)

out <- cbind(df1,
  df2[match(df1$V1,df2$V1),-1],
  df2[match(df1$V2,df2$V1),-1]
)
names(out)[3:6] <- c("Va1","Vb1","Va2","Vb2")
rownames(out) <- 1:nrow(out)

Gives

> out
    V1 V2 Va1 Vb1 Va2 Vb2
1    A  A   1   3   1   3
2    A  B   1   3   2   2
3    A  C   1   3   3   1
4    B  A   2   2   1   3
5    B  B   2   2   2   2
6    B  C   2   2   3   1

You'll have to rename the columns manually, as you would get multiple columns with the same name. Although it's technically possible in a dataframe, it can cause trouble later on. You can automatize this using something along the lines of:

names(out) <- 
    c("V1","V2",
      sapply(names(df2)[2:3],paste,1:2,sep="")
    )

EDIT : for big dataframes, conversion to matrices makes another huge difference. One has to pay attention to the intrinsic changes in type for the different variables. The speedup is due to the fact that cbind and merge take a whole lot of time figuring out the correct types for each variable.

With following data and functions :

n <- 1e5
df1 <- data.frame(V1 = rep(LETTERS,each=n),V2 = rep(LETTERS,n),
        stringsAsFactors=FALSE)
df2 <- data.frame(V1=LETTERS,Va=1:26,Vb=26:1,stringsAsFactors=FALSE)

fast_JM <- function(df1,df2){
  out <- cbind(
    as.matrix(df2[,-1])[match(df1$V1,df2$V1),],
    as.matrix(df2[,-1])[match(df1$V2,df2$V1),]
  )
  out <- as.data.frame(out)
  names(out) <- sapply(names(df2)[2:3],paste,1:2,sep="")
  out$V1 <- df1$V1
  out$V2 <- df1$V2
  out
}

slow_JM <- function(df1,df2){
  out <- cbind(df1,
    df2[match(df1$V1,df2$V1),-1],
    df2[match(df1$V2,df2$V1),-1]
  )
  names(out)[3:6] <- c("Va1","Vb1","Va2","Vb2")
  out
}


double_merge <- function(df1,df2){
  merge(merge(df1, df2), df2, by.x = "V2", by.y = "V1", suffixes = c("1", "2"))

}

the benchmarking becomes :

require(rbenchmark)
benchmark(fast_JM(df1,df2),slow_JM(df1,df2),double_merge(df1,df2),
      replications=1,columns=c("test","elapsed","relative"),order="relative")

                    test elapsed relative
1      fast_JM(df1, df2)    0.89  1.00000
2      slow_JM(df1, df2)   12.54 14.08989
3 double_merge(df1, df2)   42.50 47.75281

So a speedup of more than 40 times compared to the double merge, and more than 10 times compared to using dataframes.

Upvotes: 1

Chase
Chase

Reputation: 69171

You can use merge() twice to get what you want. By default, merge looks for common column names to join. In the second merge, we'll specify the column we want it to merge on:

df1 <- data.frame(V1 = c('A', 'A', 'A', 'B', 'B', 'B'), V2 = c('A', 'B', 'C', 'A', 'B', 'C'))
df2 <- data.frame(V1 = c('A', 'B', 'C', 'D'), Va = c(12, 15, 321, 54), Vb = c(23, 53, 543, 325))

merge(merge(df1, df2), df2, by.x = "V2", by.y = "V1", suffixes = c("1", "2"))

Upvotes: 4

Related Questions