JustinM
JustinM

Reputation: 1013

Confusion over behavior of Publish().Refcount()

I've got a simple program here that displays the number of letters in various words. It works as expected.

static void Main(string[] args) {
    var word = new Subject<string>();
    var wordPub = word.Publish().RefCount();
    var length = word.Select(i => i.Length);
    var report =
        wordPub
        .GroupJoin(length,
            s => wordPub,
            s => Observable.Empty<int>(),
            (w, a) => new { Word = w, Lengths = a })
        .SelectMany(i => i.Lengths.Select(j => new { Word = i.Word, Length = j }));
    report.Subscribe(i => Console.WriteLine($"{i.Word} {i.Length}"));
    word.OnNext("Apple");
    word.OnNext("Banana");
    word.OnNext("Cat");
    word.OnNext("Donkey");
    word.OnNext("Elephant");
    word.OnNext("Zebra");
    Console.ReadLine();
}

And the output is:

Apple 5
Banana 6
Cat 3
Donkey 6
Elephant 8
Zebra 5

I used the Publish().RefCount() because "wordpub" is included in "report" twice. Without it, when a word is emitted first one part of the report would get notified by a callback, and then the other part of report would be notified, double the notifications. That is kindof what happens; the output ends up having 11 items rather than 6. At least that is what I think is going on. I think of using Publish().RefCount() in this situation as simultaneously updating both parts of the report.

However if I change the length function to ALSO use the published source like this:

var length = wordPub.Select(i => i.Length);

Then the output is this:

Apple 5
Apple 6
Banana 6
Cat 3
Banana 3
Cat 6
Donkey 6
Elephant 8
Donkey 8
Elephant 5
Zebra 5

Why can't the length function also use the same published source?

Upvotes: 5

Views: 178

Answers (4)

JustinM
JustinM

Reputation: 1013

Trying to use regular Join instead of GroupJoin. I thought the problem was that when a new word was created there was a race condition inside Join between creating a new window and ending the current one. So here I tried to elimate that by pairing every word with a null signifying the end of the window. Doesn't work, just like the first version did not. How is it possible that a new window is created for each word without the previous one being closed first? Completely confused.

static void Main(string[] args) {
    var lgr = new DelegateLogger(Console.WriteLine);
    var word = new Subject<string>();
    var wordDelimited =
        word
        .Select(i => Observable.Return<string>(null).StartWith(i))
        .SelectMany(i => i);
    var wordStart = wordDelimited.Where(i => i != null);
    var wordEnd = wordDelimited.Where(i => i == null);
    var report = Observable
        .Join(
            wordStart.Log(lgr, "word"), // starts window
            wordStart.Select(i => i.Length),
            s => wordEnd.Log(lgr, "expireWord"), // ends current window
            s => Observable.Empty<int>(),
            (l, r) => new { Word = l, Length = r });
    report.Subscribe(i => Console.WriteLine($"{i.Word} {i.Length}"));
    word.OnNext("Apple");
    word.OnNext("Banana");
    word.OnNext("Cat");
    word.OnNext("Zebra");
    word.OnNext("Elephant");
    word.OnNext("Bear");
    Console.ReadLine();
}

Upvotes: 0

JustinM
JustinM

Reputation: 1013

Because GroupJoin seems to be very tricky to work with, here is another approach for correlating the inputs and outputs of functions.

static void Main(string[] args) {
    var word = new Subject<string>();
    var length = new Subject<int>();
    var report =
        word
        .CombineLatest(length, (w, l) => new { Word = w, Length = l })
        .Scan((a, b) => new { Word = b.Word, Length = a.Word == b.Word ? b.Length : -1 })
        .Where(i => i.Length != -1);
    report.Subscribe(i => Console.WriteLine($"{i.Word} {i.Length}"));
    word.OnNext("Apple"); length.OnNext(5);
    word.OnNext("Banana");
    word.OnNext("Cat"); length.OnNext(3);
    word.OnNext("Donkey");
    word.OnNext("Elephant"); length.OnNext(8);
    word.OnNext("Zebra"); length.OnNext(5);
    Console.ReadLine();
}

This approach works if every input has 0 or more outputs subject to the constraints that (1) outputs only arrive in the same order as the inputs AND (2) each output corresponds to its most recent input. This is like a LeftJoin - each item in the first list (word) is paired with items in the right list (length) that subsequently arrive, up until another item in the first list is emitted.

Upvotes: 0

Lee Campbell
Lee Campbell

Reputation: 10783

This was a great challenge to solve! So subtle the conditions that this happens. Apologies in advance for the long explanation, but bear with me!

TL;DR

Subscriptions to the published source are processed in order, but before any other subscription directly to the unpublished source. i.e. you can jump the queue! With GroupJoin subscription order is important to determine when windows open and close.


My first concern would be that you are publish refcounting a subject. This should be a no-op. Subject<T> has no subscription cost.

So when you remove the Publish().RefCount() :

var word = new Subject<string>();
var wordPub = word;//.Publish().RefCount();
var length = word.Select(i => i.Length);

then you get the same issue.

So then I look to the GroupJoin (because my intuition suggests that Publish().Refcount() is a red herring). For me, eyeballing this alone was too hard to rationalise, so I lean on a simple debugging too I have used dozens of times of the years - a Trace or Log extension method.

public interface ILogger
{
    void Log(string input);
}
public class DumpLogger : ILogger
{
    public void Log(string input)
    {
        //LinqPad `Dump()` extension method. 
        //  Could use Console.Write instead.
        input.Dump();
    }
}


public static class ObservableLoggingExtensions
{
    private static int _index = 0;

    public static IObservable<T> Log<T>(this IObservable<T> source, ILogger logger, string name)
    {
        return Observable.Create<T>(o =>
        {
            var index = Interlocked.Increment(ref _index);
            var label = $"{index:0000}{name}";
            logger.Log($"{label}.Subscribe()");
            var disposed = Disposable.Create(() => logger.Log($"{label}.Dispose()"));
            var subscription = source
                .Do(
                    x => logger.Log($"{label}.OnNext({x.ToString()})"),
                    ex => logger.Log($"{label}.OnError({ex})"),
                    () => logger.Log($"{label}.OnCompleted()")
                )
                .Subscribe(o);

            return new CompositeDisposable(subscription, disposed);
        });
    }
}

When I add the logging to your provided code it looks like this:

var logger = new DumpLogger();

var word = new Subject<string>();
var wordPub = word.Publish().RefCount();
var length = word.Select(i => i.Length);
var report =
    wordPub.Log(logger, "lhs")
    .GroupJoin(word.Select(i => i.Length).Log(logger, "rhs"),
        s => wordPub.Log(logger, "lhsDuration"),
        s => Observable.Empty<int>().Log(logger, "rhsDuration"),
        (w, a) => new { Word = w, Lengths = a })
    .SelectMany(i => i.Lengths.Select(j => new { Word = i.Word, Length = j }));
report.Subscribe(i => ($"{i.Word} {i.Length}").Dump("OnNext"));
word.OnNext("Apple");
word.OnNext("Banana");
word.OnNext("Cat");
word.OnNext("Donkey");
word.OnNext("Elephant");
word.OnNext("Zebra");

This will then output in my log something like the following

Log with Publish().RefCount() used

0001lhs.Subscribe()             
0002rhs.Subscribe()             
0001lhs.OnNext(Apple)
0003lhsDuration.Subscribe()     
0002rhs.OnNext(5)
0004rhsDuration.Subscribe()
0004rhsDuration.OnCompleted()
0004rhsDuration.Dispose()

    OnNext
    Apple 5 

0001lhs.OnNext(Banana)
0005lhsDuration.Subscribe()     
0003lhsDuration.OnNext(Banana)
0003lhsDuration.Dispose()       
0002rhs.OnNext(6)
0006rhsDuration.Subscribe()
0006rhsDuration.OnCompleted()
0006rhsDuration.Dispose()

    OnNext
    Banana 6 
...

However when I remove the usage Publish().RefCount() the new log output is as follows:

Log without only Subject

0001lhs.Subscribe()                 
0002rhs.Subscribe()                 
0001lhs.OnNext(Apple)
0003lhsDuration.Subscribe()         
0002rhs.OnNext(5)
0004rhsDuration.Subscribe()
0004rhsDuration.OnCompleted()
0004rhsDuration.Dispose()

    OnNext
    Apple 5 

0001lhs.OnNext(Banana)
0005lhsDuration.Subscribe()         
0002rhs.OnNext(6)
0006rhsDuration.Subscribe()
0006rhsDuration.OnCompleted()
0006rhsDuration.Dispose()

    OnNext
    Apple 6 

    OnNext
    Banana 6 

0003lhsDuration.OnNext(Banana)
0003lhsDuration.Dispose()
...

This gives us some insight, however when the issue really becomes clear is when we start annotating our logs with a logical list of subscriptions.

In the original (working) code with the RefCount our annotations might look like this

//word.Subsribers.Add(wordPub)

0001lhs.Subscribe()             //wordPub.Subsribers.Add(0001lhs)
0002rhs.Subscribe()             //word.Subsribers.Add(0002rhs)
0001lhs.OnNext(Apple)
0003lhsDuration.Subscribe()     //wordPub.Subsribers.Add(0003lhsDuration)
0002rhs.OnNext(5)
0004rhsDuration.Subscribe()
0004rhsDuration.OnCompleted()
0004rhsDuration.Dispose()

    OnNext
    Apple 5 

0001lhs.OnNext(Banana)
0005lhsDuration.Subscribe()     //wordPub.Subsribers.Add(0005lhsDuration)
0003lhsDuration.OnNext(Banana)
0003lhsDuration.Dispose()       //wordPub.Subsribers.Remove(0003lhsDuration)
0002rhs.OnNext(6)
0006rhsDuration.Subscribe()
0006rhsDuration.OnCompleted()
0006rhsDuration.Dispose()

    OnNext
    Banana 6 

So in this example, when word.OnNext("Banana"); is executed the chain of observers is linked in this order

  1. wordPub
  2. 0002rhs

However, wordPub has child subscriptions! So the real subscription list looks like

  1. wordPub
    1. 0001lhs
    2. 0003lhsDuration
    3. 0005lhsDuration
  2. 0002rhs

If we annotate the Subject only log we see where the subtlety lies

0001lhs.Subscribe()                 //word.Subsribers.Add(0001lhs)
0002rhs.Subscribe()                 //word.Subsribers.Add(0002rhs)
0001lhs.OnNext(Apple)
0003lhsDuration.Subscribe()         //word.Subsribers.Add(0003lhsDuration)
0002rhs.OnNext(5)
0004rhsDuration.Subscribe()
0004rhsDuration.OnCompleted()
0004rhsDuration.Dispose()

    OnNext
    Apple 5 

0001lhs.OnNext(Banana)
0005lhsDuration.Subscribe()         //word.Subsribers.Add(0005lhsDuration)
0002rhs.OnNext(6)
0006rhsDuration.Subscribe()
0006rhsDuration.OnCompleted()
0006rhsDuration.Dispose()

    OnNext
    Apple 6 

    OnNext
    Banana 6 

0003lhsDuration.OnNext(Banana)
0003lhsDuration.Dispose()

So in this example, when word.OnNext("Banana"); is executed the chain of observers is linked in this order

1. 0001lhs
2. 0002rhs
3. 0003lhsDuration
4. 0005lhsDuration

As the 0003lhsDuration subscription is activated after the 0002rhs, it wont see the "Banana" value to terminate the window, until after the rhs has been sent the value, thus yielding it in the still open window.

Whew

As @francezu13k50 points out the obvious and simple solution to your problem is to just use word.Select(x => new { Word = x, Length = x.Length });, but as I think you have given us a simplified version of your real problem (appreciated) I understand why this isn't suitable. However, as I dont know what your real problem space is I am not sure what to suggest to you to provide a solution, except that you have one with your current code, and now you should know why it works the way it does.

Upvotes: 3

francezu13k50
francezu13k50

Reputation: 77

RefCount returns an Observable that stays connected to the source as long as there is at least one subscription to the returned Observable. When the last subscription is disposed, RefCount disposes it's connection to the source, and reconnects when a new subscription is being made. It might be the case with your report query that all subscriptions to the 'wordPub' are disposed before the query is fulfilled.

Instead of the complicated GroupJoin query you could simply do :

var report = word.Select(x => new { Word = x, Length = x.Length });

Edit: Change your report query to this if you want to use the GroupJoin operator :

    var report =
        wordPub
        .GroupJoin(length,
            s => wordPub,
            s => Observable.Empty<int>(),
            (w, a) => new { Word = w, Lengths = a })
        .SelectMany(i => i.Lengths.FirstAsync().Select(j => new { Word = i.Word, Length = j }));

Upvotes: 0

Related Questions