Reputation: 345
If I have a dataframe with a key column and data columns, like this
df <- cbind(key=c("Jane", "Jane", "Sam", "Sam", "Mary"), var1=c("a", NA, "a", "a", "c"), var2=c(NA, "b", NA, "b", "d"))
key var1 var2
"Jane" "a" NA
"Jane" NA "b"
"Sam" "a" NA
"Sam" "a" "b"
"Mary" "c" "d"
"Mary" "c" NA
And want a dataframe that merges the rows by name, overwriting NAs whenever possible, like so
key var1 var2
"Jane" "a" "b"
"Sam" "a" "b"
"Mary" "c" "d"
How can I do this?
Upvotes: 5
Views: 1159
Reputation: 103898
Here's a solution using dplyr
. Note that cbind()
creates matrices, not data frames, so I've modified the code to do what I think you meant. I also pulled out the selection algorithm into a separate function. I think this is good practice because it allows you to change your algorithm in one place if you discover you need something different.
df <- data.frame(
key = c("Jane", "Jane", "Sam", "Sam", "Mary"),
var1 = c("a", NA, "a", "a", "c"),
var2 = c(NA, "b", NA, "b", "d"),
stringsAsFactors = FALSE
)
library(dplyr)
collapse <- function(x) x[!is.na(x)][1]
df %.%
group_by(key) %.%
summarise(var1 = collapse(var1), var2 = collapse(var2))
# Source: local data frame [3 x 3]
#
# key var1 var2
# 1 Mary c d
# 2 Sam a b
# 3 Jane a b
Upvotes: 3
Reputation: 55340
library(data.table)
dtt <- as.data.table(df)
dtt[, list(var1=unique(var1[!is.na(var1)])
, var2=unique(var2[!is.na(var2)]))
, by=key]
key var1 var2
1: Jane a b
2: Mary c d
3: Sam a b
Upvotes: 6