KT_1
KT_1

Reputation: 8494

Including NAs in summary tables

For a sample dataframe:

migration <- structure(list(area.old = structure(c(2L, 2L, 2L, 2L, 2L, 2L, 
                                               2L, 1L, 1L, 1L, 1L, 1L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 2L, 1L, 
                                               3L, NA, NA, NA), .Label = c("leeds", "london", "plymouth"), class = "factor"), 
                        area.new = structure(c(7L, 13L, 3L, 2L, 4L, 7L, 6L, 7L, 6L, 
                                               13L, 5L, 8L, 7L, 11L, 12L, 9L, 1L, 10L, 11L, NA, NA, NA, 
                                               NA, 7L, 6L, 6L), .Label = c("bath", "bristol", "cambridge", 
                                                                           "glasgow", "harrogate", "leeds", "london", "manchester", 
                                                                           "newcastle", "oxford", "plymouth", "poole", "york"), class = "factor"), 
                        persons = c(6L, 3L, 2L, 5L, 6L, 7L, 8L, 4L, 5L, 6L, 3L, 4L, 
                                    1L, 1L, 2L, 3L, 4L, 9L, 4L, 5L, 7L, 9L, 10L, 15L, 4L, 7L)), .Names = c("area.old", 
                                                                                                           "area.new", "persons"), class = c("data.table", "data.frame"), row.names = c(NA, 
                                                                                                                                                                                        -26L), .internal.selfref = <pointer: 0x0000000000220788>)

I wish to summarise the data into a couple of dataframes using the code:

moved.from <- migration[as.character(area.old)!=as.character(area.new), 
                 .(persons = sum(persons)), 
                 by=.(moved.from = area.old)]

moved.to <- migration[as.character(area.old)!=as.character(area.new), 
                 .(persons = sum(persons)), 
                 by=.(moved.to = area.new)]

This produces two summary tables, firstly detailing the total of people who have moved from the areas in 'area.old'. The second table lists the destinations people have moved to (in 'area.new'). This code was kindly suggested here (Producing smmary tables for very large datasets).

When I tried it out on my own data there was a problem as I wasn't telling R how to deal with the NAs in 'area.old' or 'area.new' columns. How could I modify this code to add all the NAs in as well (i.e. include them in a row at the bottom of the moved.from and moved.to data frames adding the total number of persons in the NAs)?

Any help on this would be very much appreciated.

Upvotes: 1

Views: 391

Answers (1)

David Arenburg
David Arenburg

Reputation: 92310

Just add | is.na as additional condition within each filter

migration[as.character(area.old) != 
            as.character(area.new) | 
            is.na(area.old),
          .(persons = sum(persons)), 
          by = .(moved.from = area.old)]

#    moved.from persons
# 1:     london      24
# 2:      leeds      17
# 3:   plymouth      19
# 4:         NA      26

And

migration[as.character(area.old) !=
            as.character(area.new) |
            is.na(area.new), 
          .(persons = sum(persons)), 
          by = .(moved.to = area.new)]

#       moved.to persons
#  1:       york       9
#  2:  cambridge       2
#  3:    bristol       5
#  4:    glasgow       6
#  5:      leeds       8
#  6:     london       5
#  7:  harrogate       3
#  8: manchester       4
#  9:      poole       2
# 10:  newcastle       3
# 11:       bath       4
# 12:     oxford       9
# 13:         NA      31

As a side note, I'd recommend to convert your two columns into character class and avoid calling as.character within each operation. The following should do

migration[, names(migration)[-3L] := lapply(.SD, as.character), .SDcols = -"persons"]

Now you can compare the area.old and area.new without calling as.character

Upvotes: 1

Related Questions