Márton Oelbei
Márton Oelbei

Reputation: 59

R lists of characters to one data.frame

I've been looking around for quite a while now, but can't seem to solve this problem, although I feel like it should be an easy one.

I have 54 factors containing differing amounts of strings, names of pathways to be exact. For example, here are two factors with the elements they contain:

> PWe1
 [1] Gene_Expression                                        
 [2] miR-targeted_genes_in_muscle_cell_-_TarBase            
 [3] Generic_Transcription_Pathway

> PWe2
  [1] miR-targeted_genes_in_epithelium_-_TarBase                           
  [2] miR-targeted_genes_in_leukocytes_-_TarBase                           
  [3] miR-targeted_genes_in_lymphocytes_-_TarBase                          
  [4] miR-targeted_genes_in_muscle_cell_-_TarBase

What I would like to do is take these, and combine them into one big data frame with 54 columns, where each column has the names of one corresponding factor. I've tried cbind, cbind.data.frame and a couple of other options but those return numeric values instead of strings.


Expected output:

PWe1 PWe2
Gene_Expression miR-targeted_genes_in_epithelium_-_TarBase
miR-targeted_genes_in_muscle_cell_-_TarBase miR-targeted_genes_in_leukocytes_-_TarBase
Generic_Transcription_Pathway miR-targeted_genes_in_lymphocytes_-_TarBase
NA miR-targeted_genes_in_muscle_cell_-_TarBase

I'm quite a beginner when it comes to R, could anyone nudge me towards a possible solution?

Thanks in advance!

Upvotes: 0

Views: 100

Answers (3)

Pierre L
Pierre L

Reputation: 28451

lst <- mget(ls(pattern="PW"))            #<--- Create list with all necessary vectors.
ind <- lengths(lst)                      #<--- find maximum length
as.data.frame(do.call(cbind, 
  lapply(lst, `length<-`, max(ind))))    #<--- Convert to data.frmae
#                                          PWe1                                        PWe2
# 1                             Gene_Expression  miR-targeted_genes_in_epithelium_-_TarBase
# 2 miR-targeted_genes_in_muscle_cell_-_TarBase  miR-targeted_genes_in_leukocytes_-_TarBase
# 3               Generic_Transcription_Pathway miR-targeted_genes_in_lymphocytes_-_TarBase
# 4                                        <NA> miR-targeted_genes_in_muscle_cell_-_TarBase

Upvotes: 2

Sotos
Sotos

Reputation: 51592

l1 <- max(length(v1), length(v2))
length(v1) <- l1
length(v2) <- l1
cbind(as.character(v1), as.character(v2))
#     [,1]                                          [,2]                                         
#[1,] "Gene_Expression"                             "miR-#targeted_genes_in_epithelium_-_TarBase" 
#[2,] "miR-targeted_genes_in_muscle_cell_-_TarBase" "miR-#targeted_genes_in_leukocytes_-_TarBase" 
#[3,] "Generic_Transcription_Pathway"               "miR-#targeted_genes_in_lymphocytes_-_TarBase"
#[4,] NA                                            "miR-#targeted_genes_in_muscle_cell_-_TarBase"

Upvotes: 1

denise
denise

Reputation: 159

If you convert your factors to characters before you use cbind, you don't get numeric values:

    testFrame <- data.frame(cbind(as.character(PWe1), as.character(PWe3))

If the length of both vectors differs, cbind throws a warning and elements of the shorter vectors will be replicated. If that is unsatisfying in your case, maybe a data.frame object might not be the right choice?

Upvotes: 1

Related Questions