giac
giac

Reputation: 4309

R - sample within paired data

I am trying to randomly sample a variable within paired data. idmen is my pair-couple identifier, idind is my perso identifier and jour is the variable that needs to be randomly subset. jour needs to be same for one idmen pair. So for example, idmen == 2, we need to subset etheir dimanche or vendredi.

This is the data

    idmen idind  jour actpr1
      1     1 lundi       111
      1     2 lundi       111
      2     1 dimanche    111
      2     2 dimanche    111
      2     1 vendredi    111
      2     2 vendredi    111
      3     1 dimanche    113
      3     2 dimanche    121
      3     1 lundi       111
      3     2 lundi       111

This is the desired output (of course the ouput can varies because it must be randomly selected)

I need to sample one day for each idmen.

     idmen idind  jour actpr1
      1     1 lundi       111
      1     2 lundi       111
      2     1 dimanche    111
      2     2 dimanche    111
      3     1 dimanche    113
      3     2 dimanche    121

I thought of something like

library(dplyr) 
dta %>% group_by(idmen, jour) %>% sample_n(2)

But I do not understand why this is not working.

Any clue ?

structure(list(idmen = c(1, 1, 2, 2, 2, 2, 3, 3, 3, 3), idind = c(1, 
 2, 1, 2, 1, 2, 1, 2, 1, 2), jour = structure(c(3L, 3L, 1L, 1L, 
 7L, 7L, 1L, 1L, 3L, 3L), .Label = c("dimanche", "jeudi   ", "lundi   ", 
 "mardi   ", "mercredi", "samedi  ", "vendredi"), class = "factor"), 
actpr1 = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 3L, 4L, 1L, 
1L), .Label = c("111", "112", "113", "121", "122", "123", 
"131", "132", "141", "143", "144", "145", "146", "151", "211", 
"212", "213", "223", "231", "233", "241", "261", "262", "271", 
"272", "311", "312", "313", "324", "331", "332", "334", "335", 
"341", "342", "343", "351", "372", "373", "374", "381", "382", 
"384", "385", "399", "411", "412", "413", "414", "419", "422", 
"423", "429", "431", "433", "510", "511", "512", "513", "514", 
"521", "522", "523", "524", "531", "532", "533", "541", "542", 
"613", "614", "616", "621", "622", "623", "627", "631", "632", 
"633", "634", "635", "636", "637", "638", "641", "651", "653", 
"655", "658", "661", "662", "663", "665", "667", "668", "669", 
"671", "672", "673", "674", "678", "810", "811", "812", "813", 
"819", "911", "999"), class = "factor")), .Names = c("idmen", 
 "idind", "jour", "actpr1"), row.names = c(NA, -10L), class = "data.frame")

Upvotes: 2

Views: 680

Answers (2)

Sam Dickson
Sam Dickson

Reputation: 5249

Here's a Base R solution:

dta[unlist(sample(as.data.frame(matrix(1:nrow(dta),nrow = 2)),10,replace=T)),]

This takes advantage of the fact that a dataframe is a list. When you use sample() on a list it will take an entire column of a dataframe. Then just use unlist() on the result and you have sampled two rows together. This samples 10 pairs with replacement, but that can be changed of course.

Upvotes: 1

joran
joran

Reputation: 173577

Maybe try this:

> dta %>% group_by(idmen) %>% filter(jour == jour[sample(length(jour),1)])
Source: local data frame [6 x 4]
Groups: idmen [3]

  idmen idind     jour actpr1
  (dbl) (dbl)   (fctr) (fctr)
1     1     1 lundi       111
2     1     2 lundi       111
3     2     1 vendredi    111
4     2     2 vendredi    111
5     3     1 lundi       111
6     3     2 lundi       111

...although it would be kind of neat to have a "sample complete groups" function built into dplyr perhaps.

Upvotes: 3

Related Questions