Using Pandas to sample DataFrame using a specific column's weight

Question

I have a DataFrame which look like:

  index  name   city
  0      Yam    Hadera
  1      Meow   Hadera
  2      Don    Hadera
  3      Jazz   Hadera
  4      Bond   Tel Aviv
  5      James  Tel Aviv

I want Pandas to randomly choose values, using the number of appearances in the city column (kind of using: df.city.value_counts()), so the results of my magic function, suppose:

df.magic_sample(3, weight_column='city')

might look like:

  0     Yam      Hadera
  1     Meow     Hadera
  2     Bond     Tel Aviv

Thanks! :)

akuiper · Accepted Answer

You can group by city and then sample each group based on their length compared to the length of the original data frame:

df.groupby('city', group_keys=False).apply(lambda g: g.sample(3 * len(g)/len(df)))

Using Pandas to sample DataFrame using a specific column's weight

Answers (2)

Related Questions

Using Pandas to sample DataFrame using a specific column&#39;s weight

Answers (2)

Related Questions

Using Pandas to sample DataFrame using a specific column's weight