Reputation: 893
Does anyone know how pandas.df.sample normalizes the weights: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.sample.html
For example if I just give the weights counts for each input: Does it just do something like [count1/sum_counts, count2/sum_counts, ...] ? Or does it do something such as Softmax? https://en.wikipedia.org/wiki/Softmax_function
Upvotes: 1
Views: 1515
Reputation: 1261
Based on the Pandas source code for DataFrame.sample, it appears that your first guess as to how weights are normalized ([count1/sum_counts, count2/sum_counts, ...]) was correct:
# Renormalize if don't sum to 1
if weights.sum() != 1:
if weights.sum() != 0:
weights = weights / weights.sum()
else:
raise ValueError("Invalid weights: weights sum to zero")
Upvotes: 3