o.o
o.o

Reputation: 143

Merge two factor columns in R

Hi I am having trouble with something in R. I'm trying to merge (combine?) two (factor) columns in a data frame. For each row, there is a value in only one of the columns and I want to combine them so that all the rows have a value. As a simplified example, suppose I've run the following code: df <- data.frame(x=c("a", "b", " ", " "), y=c(" ", " ", "q", " "), z=c(" ", " ", " ", "p")), I get the following data frame

    x   y
1   a   
2   b        
3       q

After the x and y columns are merged, The result would be

  x y merged
1 a        a
2 b        b
3   q      q

I have tried using df$merged = ifelse(df$x == " ", df$y, df$x), but it gives me these numbers. Any idea what they mean?

  x y merged
1 a        2
2 b        3
3   q      2

All the other helpful information I have come across works well with numbers, but not characters. Am I on the right track with what I have tried so far?

It seems like such a simple problem but I have not been able to find a solution. Any help would be appreciated.

Thanks all.

Upvotes: 2

Views: 4059

Answers (2)

RockScience
RockScience

Reputation: 18580

The reason the numbers appear is because your data.frame contains factors:
Check ?data.frame and especially the argument stringsAsFactors

df <- data.frame(x=c("a", "b", " ", " "), 
                 y=c(" ", " ", "q", " "), 
                 z=c(" ", " ", " ", "p"), 
                 stringsAsFactors = FALSE)

df$merged = ifelse(df$x == " ", df$y, df$x)

Side note: using space " " is not really recommended for holes in your data.frame. Use instead NA.

Upvotes: 1

akrun
akrun

Reputation: 886938

In your example dataset, there were three columns. The below approach could be used when there are multiple columns. (Here, I assumed that you have only a single "value" in each row)

df$merged <- df[cbind(1:nrow(df),max.col(df!=' ', 'first'))]
df
#  x y z merged
#1 a          a
#2 b          b
#3   q        q
#4     p      p

Or a loop method would be:

apply(df, 1, function(x) x[x!=' '])
#[1] "a" "b" "q" "p"

If there are more than one "value" per row, you can paste the values together. toString is a wrapper for paste(., collapse=", ")

apply(df,1, function(x) toString(x[x!=' ']))

Or you could melt the dataset and then use aggregate to paste the values

library(reshape2)
aggregate(value~Var1, subset(melt(as.matrix(df)), value!= ' '), 
                      toString)$value

data

df <- data.frame(x=c("a", "b", " ", " "), y=c(" ", " ", "q", " "), 
                z=c(" ", " ", " ", "p"))

Upvotes: 2

Related Questions