Reputation: 11
I have a data table that roughly looks something like this:
DT <- data.table(disease = c(0,0,1,1,1,1),
hospital = c(2,2,4,3,3,2))
Each row corresponds to a person admitted to a hospital. "hospital" is the ID number of the hospital, "disease" is the status of a specific disease. 1=sick, 0=not sick.
I want to count how many sick people there are on each hospital, also counting hospitals with no sick people AND also counting hospitals that aren't in this specific data.table, so that I am able to specify how many hospitals should be in my final table.
By using:
DT[disease==1, .N, keyby= hospital]
i get
hospital N
1: 2 2
2: 3 1
3: 4 1
But If f.ex. I want the number of hospitals to be five, my resulting data.table (it does not have to end up being a data.table, it could also be a matrix) would look something like this:
hospital N
1: 1 0
2: 2 2
2: 3 1
3: 4 1
5: 5 0
Preferably sorted. It could also be a vector of N's as long as it counts hospitals with zero incidents (but then it definitely has to be sorted).
I have a rather big set of data (also with other columns) and this is going on in a loop, so it has to be rather fast.
Thank you in advance.
Upvotes: 1
Views: 153
Reputation: 33488
DT[.(hospital = 1:5, disease = 1), on = .(hospital, disease), .N, by = .EACHI
][, .(hospital, N)]
hospital N
1: 1 0
2: 2 1
3: 3 2
4: 4 1
5: 5 0
Upvotes: 0
Reputation: 50668
I assume there is a mistake because sample data and expected output don't seem to match (see my comment above).
That aside, you could use table
table(DT[, hospital := factor(hospital, 1:5)])[2, ]
#1 2 3 4 5
#0 1 2 1 0
Or perhaps you want the sum of disease = 0
and disease = 1
counts?
colSums(table(DT[, hospital := factor(hospital, 1:5)]))
#1 2 3 4 5
#0 3 2 1 0
In both cases, the return object is a named int
vector.
Upvotes: 1