Reputation: 143
Hi I am having trouble with something in R. I'm trying to merge (combine?) two (factor) columns in a data frame. For each row, there is a value in only one of the columns and I want to combine them so that all the rows have a value. As a simplified example, suppose I've run the following code: df <- data.frame(x=c("a", "b", " ", " "), y=c(" ", " ", "q", " "), z=c(" ", " ", " ", "p"))
, I get the following data frame
x y
1 a
2 b
3 q
After the x and y columns are merged, The result would be
x y merged
1 a a
2 b b
3 q q
I have tried using df$merged = ifelse(df$x == " ", df$y, df$x)
, but it gives me these numbers. Any idea what they mean?
x y merged
1 a 2
2 b 3
3 q 2
All the other helpful information I have come across works well with numbers, but not characters. Am I on the right track with what I have tried so far?
It seems like such a simple problem but I have not been able to find a solution. Any help would be appreciated.
Thanks all.
Upvotes: 2
Views: 4059
Reputation: 18580
The reason the numbers appear is because your data.frame contains factors
:
Check ?data.frame
and especially the argument stringsAsFactors
df <- data.frame(x=c("a", "b", " ", " "),
y=c(" ", " ", "q", " "),
z=c(" ", " ", " ", "p"),
stringsAsFactors = FALSE)
df$merged = ifelse(df$x == " ", df$y, df$x)
Side note: using space " "
is not really recommended for holes in your data.frame. Use instead NA
.
Upvotes: 1
Reputation: 886938
In your example dataset, there were three columns. The below approach could be used when there are multiple columns. (Here, I assumed that you have only a single "value" in each row)
df$merged <- df[cbind(1:nrow(df),max.col(df!=' ', 'first'))]
df
# x y z merged
#1 a a
#2 b b
#3 q q
#4 p p
Or a loop method would be:
apply(df, 1, function(x) x[x!=' '])
#[1] "a" "b" "q" "p"
If there are more than one "value" per row, you can paste
the values together. toString
is a wrapper for paste(., collapse=", ")
apply(df,1, function(x) toString(x[x!=' ']))
Or you could melt
the dataset and then use aggregate
to paste
the values
library(reshape2)
aggregate(value~Var1, subset(melt(as.matrix(df)), value!= ' '),
toString)$value
df <- data.frame(x=c("a", "b", " ", " "), y=c(" ", " ", "q", " "),
z=c(" ", " ", " ", "p"))
Upvotes: 2