Reputation: 4309
I am trying to randomly sample a variable within paired data.
idmen
is my pair-couple identifier, idind
is my perso identifier and jour
is the variable that needs to be randomly subset. jour
needs to be same for one idmen
pair. So for example, idmen == 2
, we need to subset etheir dimanche
or vendredi
.
This is the data
idmen idind jour actpr1
1 1 lundi 111
1 2 lundi 111
2 1 dimanche 111
2 2 dimanche 111
2 1 vendredi 111
2 2 vendredi 111
3 1 dimanche 113
3 2 dimanche 121
3 1 lundi 111
3 2 lundi 111
This is the desired output (of course the ouput can varies because it must be randomly selected)
I need to sample one day for each idmen
.
idmen idind jour actpr1
1 1 lundi 111
1 2 lundi 111
2 1 dimanche 111
2 2 dimanche 111
3 1 dimanche 113
3 2 dimanche 121
I thought of something like
library(dplyr)
dta %>% group_by(idmen, jour) %>% sample_n(2)
But I do not understand why this is not working.
Any clue ?
structure(list(idmen = c(1, 1, 2, 2, 2, 2, 3, 3, 3, 3), idind = c(1,
2, 1, 2, 1, 2, 1, 2, 1, 2), jour = structure(c(3L, 3L, 1L, 1L,
7L, 7L, 1L, 1L, 3L, 3L), .Label = c("dimanche", "jeudi ", "lundi ",
"mardi ", "mercredi", "samedi ", "vendredi"), class = "factor"),
actpr1 = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 3L, 4L, 1L,
1L), .Label = c("111", "112", "113", "121", "122", "123",
"131", "132", "141", "143", "144", "145", "146", "151", "211",
"212", "213", "223", "231", "233", "241", "261", "262", "271",
"272", "311", "312", "313", "324", "331", "332", "334", "335",
"341", "342", "343", "351", "372", "373", "374", "381", "382",
"384", "385", "399", "411", "412", "413", "414", "419", "422",
"423", "429", "431", "433", "510", "511", "512", "513", "514",
"521", "522", "523", "524", "531", "532", "533", "541", "542",
"613", "614", "616", "621", "622", "623", "627", "631", "632",
"633", "634", "635", "636", "637", "638", "641", "651", "653",
"655", "658", "661", "662", "663", "665", "667", "668", "669",
"671", "672", "673", "674", "678", "810", "811", "812", "813",
"819", "911", "999"), class = "factor")), .Names = c("idmen",
"idind", "jour", "actpr1"), row.names = c(NA, -10L), class = "data.frame")
Upvotes: 2
Views: 680
Reputation: 5249
Here's a Base R solution:
dta[unlist(sample(as.data.frame(matrix(1:nrow(dta),nrow = 2)),10,replace=T)),]
This takes advantage of the fact that a dataframe is a list. When you use sample()
on a list it will take an entire column of a dataframe. Then just use unlist()
on the result and you have sampled two rows together. This samples 10 pairs with replacement, but that can be changed of course.
Upvotes: 1
Reputation: 173577
Maybe try this:
> dta %>% group_by(idmen) %>% filter(jour == jour[sample(length(jour),1)])
Source: local data frame [6 x 4]
Groups: idmen [3]
idmen idind jour actpr1
(dbl) (dbl) (fctr) (fctr)
1 1 1 lundi 111
2 1 2 lundi 111
3 2 1 vendredi 111
4 2 2 vendredi 111
5 3 1 lundi 111
6 3 2 lundi 111
...although it would be kind of neat to have a "sample complete groups" function built into dplyr perhaps.
Upvotes: 3