Reputation: 9061
Factors are a type of vector in R for which the elements are categorical values that could also be ordered. The values are stored internally as integers with labeled levels.
# In R:
> x = c( "high" , "medium" , "low" , "high" , "medium" )
> xf = factor( x )
> xf
[1] high medium low high medium
Levels: high low medium
> as.numeric(xf)
[1] 1 3 2 1 3
> xfo = factor( x , levels=c("low","medium","high") , ordered=TRUE )
> xfo
[1] high medium low high medium
Levels: low < medium < high
> as.numeric(xfo)
[1] 3 2 1 3 2
I checked Julia documentation and John Myles White's Comparing Julia and R’s Vocabularies (might be obsolote) - there seems no such a concept as factor
. Is factor used quite often, and what's julia's solution to this problem?
Upvotes: 4
Views: 1219
Reputation: 18227
The PooledDataArray
in the DataFrames
package is one possible alternative corresponding to R's factors. The following implements your example using it:
julia> using DataFrames # install with Pkg.add(DataFrames) if required
julia> x = ["high" , "medium" , "low" , "high" , "medium"];
julia> xf = PooledDataArray(x)
5-element DataArrays.PooledDataArray{ASCIIString,UInt32,1}:
"high"
"medium"
"low"
"high"
"medium"
julia> xf.refs
5-element Array{UInt32,1}:
0x00000001
0x00000003
0x00000002
0x00000001
0x00000003
julia> xfo = PooledDataArray(x,["low","medium","high"]);
julia> xfo.refs
5-element Array{UInt32,1}:
0x00000003
0x00000002
0x00000001
0x00000003
0x00000002
Upvotes: 3