Reputation: 8494
For a sample dataframe:
migration <- structure(list(area.old = structure(c(2L, 2L, 2L, 2L, 2L, 2L,
2L, 1L, 1L, 1L, 1L, 1L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 2L, 1L,
3L, NA, NA, NA), .Label = c("leeds", "london", "plymouth"), class = "factor"),
area.new = structure(c(7L, 13L, 3L, 2L, 4L, 7L, 6L, 7L, 6L,
13L, 5L, 8L, 7L, 11L, 12L, 9L, 1L, 10L, 11L, NA, NA, NA,
NA, 7L, 6L, 6L), .Label = c("bath", "bristol", "cambridge",
"glasgow", "harrogate", "leeds", "london", "manchester",
"newcastle", "oxford", "plymouth", "poole", "york"), class = "factor"),
persons = c(6L, 3L, 2L, 5L, 6L, 7L, 8L, 4L, 5L, 6L, 3L, 4L,
1L, 1L, 2L, 3L, 4L, 9L, 4L, 5L, 7L, 9L, 10L, 15L, 4L, 7L)), .Names = c("area.old",
"area.new", "persons"), class = c("data.table", "data.frame"), row.names = c(NA,
-26L), .internal.selfref = <pointer: 0x0000000000220788>)
I wish to summarise the data into a couple of dataframes using the code:
moved.from <- migration[as.character(area.old)!=as.character(area.new),
.(persons = sum(persons)),
by=.(moved.from = area.old)]
moved.to <- migration[as.character(area.old)!=as.character(area.new),
.(persons = sum(persons)),
by=.(moved.to = area.new)]
This produces two summary tables, firstly detailing the total of people who have moved from the areas in 'area.old'. The second table lists the destinations people have moved to (in 'area.new'). This code was kindly suggested here (Producing smmary tables for very large datasets).
When I tried it out on my own data there was a problem as I wasn't telling R how to deal with the NAs in 'area.old' or 'area.new' columns. How could I modify this code to add all the NAs in as well (i.e. include them in a row at the bottom of the moved.from and moved.to data frames adding the total number of persons in the NAs)?
Any help on this would be very much appreciated.
Upvotes: 1
Views: 391
Reputation: 92310
Just add | is.na
as additional condition within each filter
migration[as.character(area.old) !=
as.character(area.new) |
is.na(area.old),
.(persons = sum(persons)),
by = .(moved.from = area.old)]
# moved.from persons
# 1: london 24
# 2: leeds 17
# 3: plymouth 19
# 4: NA 26
And
migration[as.character(area.old) !=
as.character(area.new) |
is.na(area.new),
.(persons = sum(persons)),
by = .(moved.to = area.new)]
# moved.to persons
# 1: york 9
# 2: cambridge 2
# 3: bristol 5
# 4: glasgow 6
# 5: leeds 8
# 6: london 5
# 7: harrogate 3
# 8: manchester 4
# 9: poole 2
# 10: newcastle 3
# 11: bath 4
# 12: oxford 9
# 13: NA 31
As a side note, I'd recommend to convert your two columns into character class and avoid calling as.character
within each operation. The following should do
migration[, names(migration)[-3L] := lapply(.SD, as.character), .SDcols = -"persons"]
Now you can compare the area.old
and area.new
without calling as.character
Upvotes: 1