Maciek Leks
Maciek Leks

Reputation: 1448

R's table function in Julia (for DataFrames)

Is there something like R's table function in Julia? I've read about xtab, but do not know how to use it.

Suppose we have R's data.frame rdata which col6 is of the Factor type.

R sample code:

rdata <- read.csv("mycsv.csv") #1 table(rdata$col6) #2

In order to read data and make factors in Julia I do it like this:

using DataFrames jldata = readtable("mycsv.csv", makefactors=true) #1 :col6 will be now pooled.

..., but how to build R's table like in julia (how to achieve #2)?

Upvotes: 8

Views: 3389

Answers (3)

PKumar
PKumar

Reputation: 11128

I believe, "by" is depreciated in Julia as of 1.5.3 (It says: ERROR: ArgumentError: by function was removed from DataFrames.jl).

So here are some alternatives, we can use split apply combine to do a cross tabs as well or use FreqTables.

Using Split Combine:

Example 1 - SingleColumn:

using RDatasets
using DataFrames

mtcars = dataset("datasets", "mtcars")

## To do a table on cyl column

gdf = groupby(mtcars, :Cyl)
combine(gdf, nrow)

Output:

#    3×2 DataFrame
#     Row │ Cyl    nrow
#         │ Int64  Int64
#    ─────┼──────────────
#       1 │     6      7
#       2 │     4     11
#       3 │     8     14

Example 2 - CrossTabs Between 2 columns:

## we have to just change the groupby code a little bit and rest is same

gdf = groupby(mtcars, [:Cyl, :AM])
combine(gdf, nrow) 

Output:

#6×3 DataFrame
# Row │ Cyl    AM     nrow
#     │ Int64  Int64  Int64
#─────┼─────────────────────
#   1 │     6      1      3
#   2 │     4      1      8
#   3 │     6      0      4
#   4 │     8      0     12
#   5 │     4      0      3
#   6 │     8      1      2

Also on a side note if you don't like the name as nrow on top, you can use : combine(gdf, nrow => :Count) to change the name to Count

Alternate way: Using FreqTables

You can use package, FreqTables like below to do count and proportion very easily, to add it you can use Pkg.add("FreqTables") :

## Cross tab between cyl and am
freqtable(mtcars.Cyl, mtcars.AM)

## Proportion between cyl and am
prop(freqtable(mtcars.Cyl, mtcars.AM))

## with margin like R you can use it too in this (columnwise proportion: margin=2)
 prop(freqtable(mtcars.Cyl, mtcars.AM), margins=2)

## with margin for rowwise proportion: margin = 1
 prop(freqtable(mtcars.Cyl, mtcars.AM), margins=1)

Outputs:

## count cross tabs
#3×2 Named Array{Int64,2}
#Dim1 ╲ Dim2 │  0   1
#────────────┼───────
#4           │  3   8
#6           │  4   3
#8           │ 12   2

## proportion wise (overall)
#3×2 Named Array{Float64,2}
#Dim1 ╲ Dim2 │       0        1
#────────────┼─────────────────
#4           │ 0.09375     0.25
#6           │   0.125  0.09375
#8           │   0.375   0.0625


## Column wise proportion
#3×2 Named Array{Float64,2}
#Dim1 ╲ Dim2 │        0         1
#────────────┼───────────────────
#4           │ 0.157895  0.615385
#6           │ 0.210526  0.230769
#8           │ 0.631579  0.153846

## Row wise proportion
#3×2 Named Array{Float64,2}
#Dim1 ╲ Dim2 │        0         1
#────────────┼───────────────────
#4           │ 0.272727  0.727273
#6           │ 0.571429  0.428571
#8           │ 0.857143  0.142857

Upvotes: 6

Maciek Leks
Maciek Leks

Reputation: 1448

I came to the conclusion that a similar effect can be achieved using by:

Let jldata consists of :gender column.

julia> by(jldata, :gender, nrow) 3x2 DataFrames.DataFrame | Row | gender | x1 | |-----|----------|-------| | 1 | NA | 175 | | 2 | "female" | 40254 | | 3 | "male" | 58574 |

Of course it's not a table but at least I get the same data type as the datasource. Surprisingly by seems to be faster than countmap.

Upvotes: 7

Andreas Noack
Andreas Noack

Reputation: 1380

You can use the countmap function from StatsBase.jl to count the entries of a single variable. General cross tabulation and statistical tests for contingency tables are lacking at this point. As Ismael points out, this has been discussed in the issue tracker for StatsBase.jl.

Upvotes: 8

Related Questions