Adding a column using the data.table package in R

Question

For an example dataframe:

df = structure(list(country = c("AT", "AT", "AT", "BE", "BE", "BE", 
                             "DE", "DE", "DE", "DE", "DE", "DE", "DE", "DE", "DE", "DE", "DE", 
                             "DE", "DE", "DE"), level = c("1", "1", "1", "1", "1", "1", "1", 
                                                          "1", "1", "1", "1", "1", "1", "1", "1", "1", "1", "1", "1", "1"
                             ), region = c("AT2", "AT1", "AT3", "BE2", "BE1", "BE3", "DE4", 
                                           "DE3", "DE9", "DE7", "DE1", "DEE", "DEG", "DE2", "DED", "DEB", 
                                           "DEA", "DEF", "DE6", "DE8"), N = c("348", "707", "648", "952", 
                                                                              "143", "584", "171", "155", "234", "176", "302", "144", "148", 
                                                                              "386", "257", "126", "463", "74", "44", "119"), result = c("24.43", 
                                                                                                                                         "26.59", "20.37", "23.53", "16.78", "25.51", "46.2", "43.23", 
                                                                                                                                         "41.03", "37.5", "33.44", "58.33", "47.97", "34.46", "39.69", 
                                                                                                                                         "31.75", "36.93", "43.24", "36.36", "43.7")), .Names = c("country", 
                                                                                                                                                                                                  "level", "region", "N", "result"), class = c("data.table", "data.frame"
                                                                                                                                                                                                  ), row.names = c(NA, -20L))

I am using the following code to create a summary dataframe, listing the max and min values by country:

variable_country <- setDT(df)[order(country), list(min_result = min(result), max_result = max(result)), by = c("country")]

I also wish to include the variable 'level' from 'df'' - how would I do this in R? i.e. my variable_country dataframe would have an extra column to show that these particular countries are at level (1) . The dataframe should just have an extra column, but still three observations (one for each country). All observations for each country are at the same level.

akrun · Accepted Answer

If there is only a single 'level' for each 'country', we can create the summarised dataset with including the first observation of 'level' (level[1L]).

setDT(df)[order(country), list(min_result = min(result),
    max_result = max(result), level= level[1L]), by = country]

Having said that, another option would be to use 'level' as the grouping variable, i.e. by = .(country, level)] in the code. (as suggested by @David Arenburg)

Adding a column using the data.table package in R

Answers (1)

Related Questions