Malou Storgaard
Malou Storgaard

Reputation: 11

Showing rows of zeros when using .N

I have a data table that roughly looks something like this:

DT <- data.table(disease = c(0,0,1,1,1,1),
   hospital = c(2,2,4,3,3,2))

Each row corresponds to a person admitted to a hospital. "hospital" is the ID number of the hospital, "disease" is the status of a specific disease. 1=sick, 0=not sick.

I want to count how many sick people there are on each hospital, also counting hospitals with no sick people AND also counting hospitals that aren't in this specific data.table, so that I am able to specify how many hospitals should be in my final table.

By using: DT[disease==1, .N, keyby= hospital] i get

   hospital N
1:        2 2
2:        3 1
3:        4 1

But If f.ex. I want the number of hospitals to be five, my resulting data.table (it does not have to end up being a data.table, it could also be a matrix) would look something like this:

   hospital N
1:        1 0
2:        2 2
2:        3 1
3:        4 1
5:        5 0

Preferably sorted. It could also be a vector of N's as long as it counts hospitals with zero incidents (but then it definitely has to be sorted).

I have a rather big set of data (also with other columns) and this is going on in a loop, so it has to be rather fast.

Thank you in advance.

Upvotes: 1

Views: 153

Answers (2)

s_baldur
s_baldur

Reputation: 33488

DT[.(hospital = 1:5, disease = 1), on = .(hospital, disease), .N, by = .EACHI
   ][, .(hospital, N)]

   hospital N
1:        1 0
2:        2 1
3:        3 2
4:        4 1
5:        5 0

Upvotes: 0

Maurits Evers
Maurits Evers

Reputation: 50668

I assume there is a mistake because sample data and expected output don't seem to match (see my comment above).

That aside, you could use table

table(DT[, hospital := factor(hospital, 1:5)])[2, ]
#1 2 3 4 5
#0 1 2 1 0

Or perhaps you want the sum of disease = 0 and disease = 1 counts?

colSums(table(DT[, hospital := factor(hospital, 1:5)]))
#1 2 3 4 5
#0 3 2 1 0

In both cases, the return object is a named int vector.

Upvotes: 1

Related Questions