rvrvrv
rvrvrv

Reputation: 911

Append values from column 2 to values from column 1

In R, I have two data frames (A and B) that share columns (1, 2 and 3). Column 1 has a unique identifier, and is the same for each data frame; columns 2 and 3 have different information. I'm trying to merge these two data frames to get 1 new data frame that has columns 1, 2, and 3, and in which the values in column 2 and 3 are concatenated: i.e. column 2 of the new data frame contains: [data frame A column 2 + data frame B column 2]

Example:

dfA <- data.frame(Name = c("John","James","Peter"),
                  Score = c(2,4,0),
                  Response = c("1,0,0,1","1,1,1,1","0,0,0,0"))

dfB <- data.frame(Name = c("John","James","Peter"),
                  Score = c(3,1,4),
                  Response = c("0,1,1,1","0,1,0,0","1,1,1,1"))

dfA:
    Name Score Response
1  John     2  1,0,0,1
2 James     4  1,1,1,1
3 Peter     0  0,0,0,0

dfB:
   Name Score Response
1  John     3  0,1,1,1
2 James     1  0,1,0,0
3 Peter     4  1,1,1,1

Should results in:

dfNew <- data.frame(Name = c("John","James","Peter"),
                    Score = c(5,5,4),
                    Response = c("1,0,0,1,0,1,1,1","1,1,1,1,0,1,0,0","0,0,0,0,1,1,1,1"))

dfNew:
   Name Score Response
1  John     5  1,0,0,1,0,1,1,1
2 James     5  1,1,1,1,0,1,0,0
3 Peter     4  0,0,0,0,1,1,1,1

I've tried merge but that simply appends the columns (much like cbind)

Is there a way to do this, without having to cycle through all columns, like:

colnames(dfNew) <- c("Name","Score","Response")
dfNew$Score <- dfA$Score + dfB$Score
dfNew$Response <- paste(dfA$Response, dfB$Response, sep=",")

The added difficulty is, as you might have noticed, that for some columns we need to use addition, whereas others require concatenation separated by a comma (the columns requiring addition are formatted as numerical, the others as text, which might make it easier?)

Thanks in advance!

PS. The string 1,0,0,1,0,1,1,1 etc. captures the response per trial – this example has 8 trials to which participants can either respond correctly (1) or incorrectly (0); the final score is collected under Score. Just to explain why my data/example looks the way it does.

Upvotes: 1

Views: 183

Answers (2)

Henrik
Henrik

Reputation: 67828

Personally, I would try to avoid concatenating 'response per trial' to a single variable ('Response') from the start, in order to make the data less static and facilitate any subsequent steps of analysis or data management. Given that the individual trials already are concatenated, as in your example, I would therefore consider splitting them up. Formatting the data frame for a final, pretty, printed output I would consider a different, later issue.

# merge data (cbind would also work if data are ordered properly)
df <- merge(x = dfA[ , c("Name", "Response")], y = dfB[ , c("Name", "Response")],
            by = "Name")

# rename
names(df) <- c("Name", c("A", "B"))

# split concatenated columns
library(splitstackshape)
df2 <- concat.split.multiple(data = df, split.cols = c("A", "B"),
                             seps = ",", direction = "wide")

# calculate score
df2$Score <- rowSums(df2[ , -1])
df2
#    Name A_1 A_2 A_3 A_4 B_1 B_2 B_3 B_4 Score
# 1 James   1   1   1   1   0   1   0   0     5
# 2  John   1   0   0   1   0   1   1   1     5
# 3 Peter   0   0   0   0   1   1   1   1     4

Upvotes: 2

Brian Diggs
Brian Diggs

Reputation: 58875

I would approach this with a for loop over the column names you want to merge. Given your example data:

cols <- c("Score", "Response")

dfNew <- dfA[,"Name",drop=FALSE]
for (n in cols) {
  switch(class(dfA[[n]]),
         "numeric" = {dfNew[[n]] <- dfA[[n]] + dfB[[n]]},
         "factor"=, "character" = {dfNew[[n]] <- paste(dfA[[n]], dfB[[n]], sep=",")})
}

This solution is basically what you had as your idea, but with a loop. The data sets are looked at to see if they are numeric (add them numerically) or a string or factor (concatenate the strings). You could get a similar result by having two vectors of names, one for the numeric and one for the character, but this is extensible if you have other data types as well (though I don't know what they might be). The major drawback of this method is that is assumes the data frames are in the same order with regard to Name. The next solution doesn't make that assumption

dfNew <- merge(dfA, dfB, by="Name")
for (n in cols) {
  switch(class(dfA[[n]]),
         "numeric" = {dfNew[[n]] <- dfNew[[paste0(n,".x")]] + dfNew[[paste0(n,".y")]]},
         "factor"=, "character" = {dfNew[[n]] <- paste(dfNew[[paste0(n,".x")]], dfNew[[paste0(n,".y")]], sep=",")})
  dfNew[[paste0(n,".x")]] <- NULL
  dfNew[[paste0(n,".y")]] <- NULL
}

Same general idea as previous, but uses merge to make sure that the data is correctly aligned, and then works on columns (whose names are postfixed with ".x" and ".y") with dfNew. Additional steps are included to get rid of the separate columns after joining. Also has the bonus feature of carrying along any other columns not specified for joining together in cols.

Upvotes: 1

Related Questions