Sheila
Sheila

Reputation: 2597

Having Numeric data type and character data type in the same column of a data frame?

I have a large data frame (570 rows by 200000 columns) in R. For those of you that are familiar with PLINK, I am trying to create a PED file for a GWAS analysis. Plink requires that each missing character be coded with a zero. The non-missing values are "A", "T", "C", or "G".

So, for example, the data structure looks like this in the data frame.

           COL1     COL2 
     PT1    A        T      
     PT2    T        T     
     PT3    A        A
     PT4    A        T        
     PT5    0        0
     PT6    A        A 
     PT7    T        A
     PTn    T        T

When I run my file in Plink, I get an error. I went back to check my file in R and found that the zeros were "character" types. Is it possible to have two different data types (numeric and character) in a given column in R? I've tried making the 0's a numeric type and keep the letters as character type, but it won't work.

Upvotes: 1

Views: 559

Answers (1)

Valentin Ruano
Valentin Ruano

Reputation: 2809

I think Justin's advice will probably fix the problem you have with Plink, but wanting to answer your question in bold...

Is it possible to have two different data types (numeric and character) in a given column in R?

Not really, but in this particular scenario, when it is a discrete variable, kind of yes. In R you have the factor basic type, an enumerate in some other languages.

For example try this:

x = factor(c("0","A","C","G","T"),levels=c(0,"A","T","G","C"))
print(x)

[1] 0 A C G T
Levels: 0 A T G C

You can transform them back in integers (first level is 1 by default) and characters:

> as.integer(x)
[1] 1 2 5 4 3

> as.character(x)
[1] "0" "A" "C" "G" "T"

Now when you read a table with read.table you can indicate that all character types should be read as factor even those with quotes around them.

mydata = read.table("yourData.tsv",stringAsFactors=T);

Upvotes: 2

Related Questions