Qiang Li
Qiang Li

Reputation: 10855

select unique values with equal probability

I have a data frame like the following

c1 c2
1 2
1 3
2 4
2 5
2 2
3 1
3 2
...

I want to get unique c1 values, where c2 can be chosen with equal probability if there are multiple rows with the same c1 value. For example, the final result can be:

c1 c2
1 2
2 2
3 2
...

"A random choice of c2 for each possible value of c1" is what I want.

Upvotes: 0

Views: 138

Answers (2)

Sven Hohenstein
Sven Hohenstein

Reputation: 81693

Here's an easy way to sample a value of c2 for each unique value of c1:

aggregate(c2 ~ c1, dat, sample, 1) # dat is the name of you data frame

  c1 c2
1  1  2
2  2  4
3  3  1

Upvotes: 0

Stefan Wager
Stefan Wager

Reputation: 126

Here's a simple way to do it. Let's say your dataframe is called df.

x = unique(df$c1);
y = sapply(x, function(arg)sample(df$c2[df$c1 == arg], 1));
new_df = data.frame(c1 = x, c2 = y);

Upvotes: 1

Related Questions