R - sample within paired data

Question

I am trying to randomly sample a variable within paired data. idmen is my pair-couple identifier, idind is my perso identifier and jour is the variable that needs to be randomly subset. jour needs to be same for one idmen pair. So for example, idmen == 2, we need to subset etheir dimanche or vendredi.

This is the data

    idmen idind  jour actpr1
      1     1 lundi       111
      1     2 lundi       111
      2     1 dimanche    111
      2     2 dimanche    111
      2     1 vendredi    111
      2     2 vendredi    111
      3     1 dimanche    113
      3     2 dimanche    121
      3     1 lundi       111
      3     2 lundi       111

This is the desired output (of course the ouput can varies because it must be randomly selected)

I need to sample one day for each idmen.

     idmen idind  jour actpr1
      1     1 lundi       111
      1     2 lundi       111
      2     1 dimanche    111
      2     2 dimanche    111
      3     1 dimanche    113
      3     2 dimanche    121

I thought of something like

library(dplyr) 
dta %>% group_by(idmen, jour) %>% sample_n(2)

But I do not understand why this is not working.

Any clue ?

structure(list(idmen = c(1, 1, 2, 2, 2, 2, 3, 3, 3, 3), idind = c(1, 
 2, 1, 2, 1, 2, 1, 2, 1, 2), jour = structure(c(3L, 3L, 1L, 1L, 
 7L, 7L, 1L, 1L, 3L, 3L), .Label = c("dimanche", "jeudi   ", "lundi   ", 
 "mardi   ", "mercredi", "samedi  ", "vendredi"), class = "factor"), 
actpr1 = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 3L, 4L, 1L, 
1L), .Label = c("111", "112", "113", "121", "122", "123", 
"131", "132", "141", "143", "144", "145", "146", "151", "211", 
"212", "213", "223", "231", "233", "241", "261", "262", "271", 
"272", "311", "312", "313", "324", "331", "332", "334", "335", 
"341", "342", "343", "351", "372", "373", "374", "381", "382", 
"384", "385", "399", "411", "412", "413", "414", "419", "422", 
"423", "429", "431", "433", "510", "511", "512", "513", "514", 
"521", "522", "523", "524", "531", "532", "533", "541", "542", 
"613", "614", "616", "621", "622", "623", "627", "631", "632", 
"633", "634", "635", "636", "637", "638", "641", "651", "653", 
"655", "658", "661", "662", "663", "665", "667", "668", "669", 
"671", "672", "673", "674", "678", "810", "811", "812", "813", 
"819", "911", "999"), class = "factor")), .Names = c("idmen", 
 "idind", "jour", "actpr1"), row.names = c(NA, -10L), class = "data.frame")

joran · Accepted Answer

Maybe try this:

> dta %>% group_by(idmen) %>% filter(jour == jour[sample(length(jour),1)])
Source: local data frame [6 x 4]
Groups: idmen [3]

  idmen idind     jour actpr1
  (dbl) (dbl)   (fctr) (fctr)
1     1     1 lundi       111
2     1     2 lundi       111
3     2     1 vendredi    111
4     2     2 vendredi    111
5     3     1 lundi       111
6     3     2 lundi       111

...although it would be kind of neat to have a "sample complete groups" function built into dplyr perhaps.

R - sample within paired data

Answers (2)

Related Questions