Nick
Nick

Reputation: 9061

What's Julia's solution to R's factor concept?

Factors are a type of vector in R for which the elements are categorical values that could also be ordered. The values are stored internally as integers with labeled levels.

# In R:
> x = c( "high" , "medium" , "low" , "high" , "medium" )

> xf = factor( x )
> xf
[1] high     medium low     high     medium
Levels: high low medium

> as.numeric(xf)
[1] 1 3 2 1 3

> xfo = factor( x , levels=c("low","medium","high") , ordered=TRUE )
> xfo
[1] high     medium low     high     medium
Levels: low < medium < high

> as.numeric(xfo)
[1] 3 2 1 3 2

I checked Julia documentation and John Myles White's Comparing Julia and R’s Vocabularies (might be obsolote) - there seems no such a concept as factor. Is factor used quite often, and what's julia's solution to this problem?

Upvotes: 4

Views: 1219

Answers (2)

xiaodai
xiaodai

Reputation: 16064

The CategoricalArrays.jl's CategoricalArray resembles factors.

Upvotes: 0

Dan Getz
Dan Getz

Reputation: 18227

The PooledDataArray in the DataFrames package is one possible alternative corresponding to R's factors. The following implements your example using it:

julia> using DataFrames # install with Pkg.add(DataFrames) if required

julia> x = ["high" , "medium" , "low" , "high" , "medium"];

julia> xf = PooledDataArray(x)
5-element DataArrays.PooledDataArray{ASCIIString,UInt32,1}:
 "high"  
 "medium"
 "low"   
 "high"  
 "medium"

julia> xf.refs
5-element Array{UInt32,1}:
 0x00000001
 0x00000003
 0x00000002
 0x00000001
 0x00000003

julia> xfo = PooledDataArray(x,["low","medium","high"]);

julia> xfo.refs
5-element Array{UInt32,1}:
 0x00000003
 0x00000002
 0x00000001
 0x00000003
 0x00000002

Upvotes: 3

Related Questions