Reputation: 320
I have a very large data table which consists of a response variable which can be zero (failure) or 1 success. In the data.table there follow very many columns, each one represents a factor by which the response rate may vary. I would like to run an identical analysis over each column of the data which will produce various things, for example a table of the response rate by that column, a histogram and so on.
An example of the table I would like to produce is shown below:
data.dt <- data.table(response=round(runif(100,0,1),0),
factor1 = sample(LETTERS,100, replace=TRUE),
factor2 = sample(letters,100, replace=TRUE)
)
setkey(data.dt, factor1)
table.dt <- data.dt[,list(patients=.N,
successes=sum(response)
),
by=list(factor1)
]
table.dt[,successesRate := round(successes/patients,2)]
table.dt
Just sticking to the table for now, how can I run this table over each column of the data.table (in this case over factor1 and factor2) and store the resulting tables separately (possibly in a list) for further reference?
Many thanks.
Upvotes: 1
Views: 118
Reputation: 5831
I would melt the data first:
library(reshape2)
data.dt.melt <- melt(data.dt, id.vars="response",
variable.name="table", value.name="factor")
table.dt <- data.dt.melt[,list(patients=.N,
successes=sum(response)),
by=list(table, factor)]
table factor patients successes
1: factor1 A 4 3
2: factor1 B 3 0
3: factor1 C 3 2
4: factor1 D 2 1
5: factor1 E 7 3
6: factor1 F 4 1
...
47: factor2 x 2 0
48: factor2 i 3 0
49: factor2 t 2 1
50: factor2 e 4 1
51: factor2 a 2 1
52: factor2 k 3 2
You could probably stop there with just one table and use the "table" column to select the "factor" that you want to work with. But if you prefer splitting them in to a list of two tables:
table.list <- list(factor1=table.dt[table=="factor1"],!"table",with=F],
factor2=table.dt[table=="factor2"][,!"table",with=F])
Upvotes: 2