Reputation: 1568

R Converting Factors into New Variables

I have two variables with multiple levels; V1 has 400 levels and V2 has ≈ 250 levels. How can I transform V2's factors into several different variables and use variable V1 as the unique identifier?

V1             V2
Garza, Mike    a
Garza, Mike    b
Smith, James   a 
Smith, James   f 
Smith, James   z 
Moore, Jen     b
Klein, April   f

The dataframe should look like the example below. Note: How variables can contain multiple factors, not one variable per factor. Considering Mike has two factors associated with him, factors a and b go into V2 and V3, where Jen, factor b also goes into V2, not V3.

V1             V2 V3 V4 V5
Garza, Mike    a  b
Smith, James   a  f  z
Moore, Jen     b
Klein, April   f

Any help would be greatly appreciated!

Thank you.

Upvotes: 1

Answers (3)

mfidino

Reputation: 3055

You can do the first part with dcast in the reshape package and then sort them further to your desired output with apply.

dat <- data.frame(V1 = factor(c("Garza", "Garza",
                          "Smith", "Smith", "Smith",
                          "Moore", "Klein")),
                  V2 = c("a","b","a","f","z","b","f"))

# recast your data
dd <- dcast(dat, V1~V2)

#make a function to use with apply

shift_values<- function(x){
  notna <-which(!is.na(x[-1]))
  val <- x[notna+1]
  x[-1] <- c(as.character(val), rep("", (length(x)-1-length(val))))
  return(x)
}

# use it in an apply loop, transpose the data, and turn it into a data.frame
result <- data.frame(t(apply(dd, 1, shift_values)))

# change the column names
colnames(result)[-1] <- paste0("V", 2:(ncol(result)))

The data then looks like this:

     V1 V2 V3 V4 V5
1 Garza  a  b      
2 Klein  f         
3 Moore  b         
4 Smith  a  f  z

Upvotes: 1

Mark Peterson

Reputation: 9570

It appears that you want a vector of the V2 levels that are present for each V1 level (Individual). That is not really how columns are designed to work in data.frames, even if you can do it in Excel. Instead, I would suggest that you just make the result a vector for each individual, like so:

split(df$V2, df$V1)

which returns:

$`Garza, Mike`
[1] a b
Levels: a b f z

$`Klein, April`
[1] f
Levels: a b f z

$`Moore, Jen`
[1] b
Levels: a b f z

$`Smith, James`
[1] a f z
Levels: a b f z

Without knowing your use case, I can't say if this will actually better or not. However, in my general experience, it tends to be easier to work with. If you just need to print them, you can always collapse them. For example, if you save the above split result to out, you can do this, which can then be added as a column to some other output table:

out <- split(df$V2, df$V1)

sapply(out, paste, collapse = ", ")

gives

 Garza, Mike Klein, April   Moore, Jen Smith, James 
      "a, b"          "f"          "b"    "a, f, z"

Or, if you want to know who has a certain group, you can do this:

sapply(out, function(x){"f" %in% x})

Which gives:

 Garza, Mike Klein, April   Moore, Jen Smith, James 
       FALSE         TRUE        FALSE         TRUE

Upvotes: 1

Jilber Urbina

Reputation: 61154

This is a reshape problem. Consider df is your data.frame, you can try using this:

> library(reshape2)
> print(dcast(melt(df), ...~V2), na.print="")
Using V1, V2 as id variables
Using V2 as value column: use value.var to override.
           V1 a b f z
1  Garza,Mike a b    
2 Klein,April     f  
3   Moore,Jen   b    
4 Smith,James a   f z

Upvotes: 3

R Converting Factors into New Variables

Answers (3)

Related Questions