Peter L.
Peter L.

Reputation: 7304

Accurate realtime updating solution for distribution prediction based on current statistics

As a part of my current project I need to find a pretty Excel solution for the following: there's online game where people gain experience (similar to SO rep) and move to the next level as soon as enough exp is gained (again, this is very much like SO privileges). Sure thing this requires different time for everyone: some get enough exp during the 1st week, others need 5 weeks to pass the level. Statistics is the following:

Week from join: Week 1  Week 2  Week 3  Week 4  Week 5
Level up:       5%      15%     40%     30%     10%

That is, if during this week we had some people joined the game, 5 weeks after all of them will pass to the next level (some will give up and leave, but for the simplicity I do not include that factor in the list of conditions).

As for every game, the number of current players matters. To keep the game popular and exciting, there should be definite minimal number of players on each level. Obviously, this value is changing constantly: new people come, while others move to the next level. However - and that's where the action begins! - this minimal value of current players is NOT constant and may wary depending on many factors. For the simplicity it means that for future 10-15 weeks there's a set of predicted minimal amounts of players for each week that should be achieved (this is based on collected statistics, but for the given task this is a known set).

So far so good, the picture looks like this:

Sample 1

While in the leftmost column there are numbers of people joined the game each week, row of bold numbers under each week is the number of people passed during this week to the next level.

Having numbers of newcomers, the calculation of total resulting level ups for each week is trivial. The opposite is not at all such easy (well, at least for me) - and that IS the question: how to predict the required number of newcomers for each week (leftmost column) - to obtain the desired number of people who will pass to the next level during next weeks (bold row in the bottom)?

I still can't find an appropriate and accurate solution for 2 weeks so far. I tried all possible options - from complex & array formulas to Solver addin, but no luck: I always have more unknowns than equations. However, I'm pretty sure this is not an impossible task: for some solutions Solver could provide numbers fairly close to the desired, but not for every set of desired values.

Some assumptions and (perhaps) important conditions:

  1. Though should not be a problem, but actual weekly distribution %%% numbers are updated on demand basis via DB query and further Excel processing.
  2. Actual newcomer values available so far from the DB are of course considered as known values for the equations. Obviously they cannot be changed, and e.g. if one week less than desired number of people joined the game in fact, that should affect calculated predictions for future weeks to achieve overall.
  3. Total sum of players actually passed to the next level should remain unchanged for the certain time period: e.g. if we want that during each winter week starting Jan, 1 at least 100 people join the game, that gives us 8 full weeks and therefore 800 gamers in total. However, in case in January we actually got 80 new players each week - it means hard work for February) For certainty "checkpoint" subtotals are calculated every 10 weeks. In any case, any low values must be compensated during several upcoming weeks.
  4. The opposite is also true: if we have more newcomers than expected for the given week - we may attract less players during next week(s) to meet the next subtotal goal.
  5. The flow of players is infinite - picture sample is given just for the purpose of clarity and better understanding. Desired values may be calculated for as long in the future as required - prediction pattern is repeatable. If that matters for the solution - let it be for 10 weeks from current.
  6. Pure formula solution is preferable - even very complicated. However, VBA is still an acceptable option. In any case, accurate realtime / on demand update of calculated values is absolutely better than simple, but semi-manual or approximate solution.

As for my every question here - I don't ask for the ready to play solution, but the right guidance or a valuable advice will be very much appreciated. I deliberately do not post this to https://stats.stackexchange.com/ - it's more like programming question, since the theory is obviously simple... or perhaps too simple, so that I don't see quite obvious solution.

Please feel free to request any additional details or explanations via comments: although I tried to be as clear and simple as possible, some aspects that are not important for me may be in fact vital for the accurate solution.

Sample file (with the above sample): https://www.dropbox.com/s/q3bnagyet86i34r/StatPrognosis01.xlsx

Upvotes: 4

Views: 285

Answers (1)

Peter Albert
Peter Albert

Reputation: 17475

No need for VBA or complex array formulas - some simple math converted to formulas will be all you need:

The main idea is, that you can determine the number for each week by "reversing" the calculation. The logic is the following: (Note: cell references based on the screen shot, i.e. the yellow 5% would be B2):

  • If 5% of one week's "joiners" make it to the next level and week 1 (overall) should have 10 people to make it, 10/5% = 200 people need to join in the first week.
    As a formula, this means: B5:=B12, A5:=B5/B2
  • From there you can calculate how the joiners from the first week distribute as you already do (i.e. no change to the rest of row 5)
  • In week 2, you want 35 people to be the next level. From the former step you know that from week 1 joiners you have 30 people reach that level. Therefore, 5 people must come from week 2 joiners. As they again make 5% of that wave, the total wave must be 5/5%=100
    In formulas: C6:=C12-C5, A6: =C6/B2
  • ...

Now instead of creating an individual formula for each week, you can generalize this with some INDEX functions. For this, I extended your example by the week numbers as both header columns&rows:

enter image description here

To calculate the number per "joiner wave per week", i.e. range C5:O17, use the following formula:

=IF($A5>C$4+N("Nobody from this wave joined yet:"),0,
    IF(C$4=$A5+N("The first week for the new wave:"),
        IF($A5=1+N("Very first wave must be expected number in this week:"),C18,
        N("All other weeks, this must be expected number minus 
          'levelups' from former weeks:")+(C$18-SUM(C4:C$5))
    ),
    N("For any later week after the joining, apply weekly uplevel ratio
       to total number of joiners:")+INDEX($C$2:$O$2,C$4-$A5+1)*$B5)
)

and for the desired number of joiners per week, i.e. range B5:B17:

=INDEX($C5:$O5,$A5)/$C$2

Here's the updated solution file!

Upvotes: 1

Related Questions