Reputation: 2386
This is a very basic example. But I am doing some data analysis and am continually finding myself writing very similar SQL count queries like so to generate probability tables.
My tables are defined such that a value of 0 implies that an event did not take place while a value of 1 implies that the event did take place.
> sqldf("select count(distinct Date) from joinedData where C_O_Above_prevHigh = 0 and C_O_Below_prevLow = 0")
count(distinct Date)
1 1081
> sqldf("select count(distinct Date) from joinedData where C_O_Above_prevHigh = 0 and C_O_Below_prevLow = 0 and E_halfGap = 1")
count(distinct Date)
1 956
> sqldf("select count(distinct Date) from joinedData where C_O_Above_prevHigh = 1 OR C_O_Below_prevLow = 1 and E_halfGap = 1")
count(distinct Date)
1 504
In the above example, my predictor variables are C_O_Above_prevHigh
and C_O_Below_prevLow
my outcome variable is E_halfGap
. There are several cases where there might be more predictor variables e.g. Time
Rather than doing the above and manually entering all my queries with different permuations, is there anything available in R or some other application that will:
1) output the potential probability paths based on my predictors? 2) allow me to choose how to split the paths
I appreciate your input.
Upvotes: 0
Views: 389
Reputation: 32351
If you want all totals and subtotals,
you can use CUBE BY
in SQL (but it is not in SQLite)
or addmargins
in R.
addmargins( Titanic )
# More readable:
ftable( addmargins( Titanic ) )
If you want to build a decision tree,
you can use the rpart
package
or check the
machine learning
or
graphical models
task views
Upvotes: 2