colinfang
colinfang

Reputation: 21757

How to return items in a seq only if they are different than their preceding items in F#?

When I write in python, I always try to think of alternatives as if I was using F#:

I have a seq of tuples (key, value1, value2, ...) I simplify the tuple here so it is only of length 2. Keys contain duplicated figures.

let xs = Seq.zip [|1;2;2;3;3;3;4;1;1;2;2|] {0..10} // so this is a seq of tuple

[(1, 0),
 (2, 1),
 (2, 2),
 (3, 3),
 (3, 4),
 (3, 5),
 (4, 6),
 (1, 7),
 (1, 8),
 (2, 9),
 (2, 10)]

Now, I would like to build a function, which takes the seq as input, and return a seq, which is a subset of the original seq.

It must captures all items where the keys are changed, and include the first and last items of the seq if they are not already there.

f1(xs) = [(1, 0), (2, 1), (3, 3), (4, 6), (1, 7), (2, 9), (2, 10)]
f1([]) = []

The following is my python code, it works, but I don't really like it.

xs = zip([1,2,2,3,3,3,4,1,1,2,2], range(11))

def f1(xs):
    if not xs:
        return
    last_a = None # I wish I don't have to use None here.
    is_yield = False
    for a, b in xs:
        if a != last_a:
            last_a = a
            is_yield = True
            yield (a, b)
        else:
            is_yield = False
    if not is_yield:
        yield (a, b) # Ugly, use variable outside the loop.

print list(f1(xs))
print list(f1([]))

Here is another way, using the itertools library

def f1(xs):
    group = None
    for _, group_iter in itertools.groupby(xs, key = lambda pair: pair[0]):
        group = list(group_iter)
        yield group[0]

    # make sure we yield xs[-1], doesn't work if xs is iterator.
    if group and len(group) > 1: # again, ugly, use variable outside the loop.
        yield group[-1]

In F#, Seq.groupBy has difference behaviour from groupby in python. I am wondering how this problem can be solved as functional as possible, and less reference cells, less mutable, and without too much hassle.

Upvotes: 1

Views: 212

Answers (5)

kaefer
kaefer

Reputation: 5751

Sorry to chime in late here. While the answers so far are very good, I feel that they don't express the fundamental need for mutable state in order to return the last element. While I could rely on IEnumerable methods too, sequence expressions are basically equivalent. We start by defining a three-way DU to encapsulate state.

type HitOrMiss<'T> =
| Starting
| Hit of 'T
| Miss of 'T

let foo2 pred xs = seq{
    let store = ref Starting        // Save last element and state
    for current in xs do            // Iterate sequence
        match !store with           // What had we before?
        | Starting ->               // No element yet
            yield current           // Yield first element
            store := Hit current
        | Hit last                  // Check if predicate is satisfied
        | Miss last when pred last current ->
            yield current           // Then yield intermediate element
            store := Hit current
        | _ ->
            store := Miss current
    match !store with
    | Miss last ->                  // Yield last element, if not already
        yield last
    | _ -> () }

[(1, 0)
 (2, 1)
 (2, 2)
 (3, 3)
 (3, 4)
 (3, 5)
 (4, 6)
 (1, 7)
 (1, 8)
 (2, 9)
 (2, 10)]
|> foo2 (fun (a,_) (b,_) -> a <> b) |> Seq.toList |> printfn "%A"
// [(1, 0); (2, 1); (3, 3); (4, 6); (1, 7); (2, 9); (2, 10)]

Upvotes: 0

latkin
latkin

Reputation: 16812

A correct solution needs to be aware of the end of the sequence, in order to satisfy the special case regarding the last element. Thus there either needs to be two passes such that length is known before processing (e.g. Tomas's solution - first pass is copy to list which unlike seq exposes its "end" as you iterate) or you need to rely on IEnumerable methods so that you know as you iterate when the end has been reached (e.g. Daniel's solution).

Below is inspired by the elegance of John's code, but handles the special cases by obtaining the length up front (2-pass).

let remove dup =
    let last = Seq.length dup - 2
    seq{
        yield Seq.head dup
        yield! dup
               |> Seq.pairwise
               |> Seq.mapi (fun i (a,b) -> if fst a <> fst b || i = last then Some(b) else None)
               |> Seq.choose id
    }

Upvotes: 0

Daniel
Daniel

Reputation: 47914

The simplest solution would likely be to convert the sequence to an array and couple John's approach with snatching the first and last elements by index. But, here's another solution to add to the mix:

let f getKey (items: seq<_>) =
  use e = items.GetEnumerator()
  let rec loop doYield prev =
    seq {
      if doYield then yield prev
      if e.MoveNext() then 
        yield! loop (getKey e.Current <> getKey prev) e.Current
      elif not doYield then yield prev
    }
  if e.MoveNext() then loop true e.Current
  else Seq.empty

//Usage: f fst xs

Upvotes: 3

Tomas Petricek
Tomas Petricek

Reputation: 243106

A recursive solution that should work, but also isn't particularly beautiful or short could looks something like this - but using pattern matching definitely makes this a bit nicer:

let whenKeyChanges input = seq {
  /// Recursively iterate over input, when the input is empty, or we found the last
  /// element, we just return it. Otherwise, we check if the key has changed since
  /// the last produced element (and return it if it has), then process the rest 
  let rec loop prevKey input = seq {
    match input with
    | [] -> ()
    | [last] -> yield last
    | (key, value)::tail ->
        if key <> prevKey then yield (key, value)
        yield! loop key tail }

  // Always return the first element if the input is not empty
  match List.ofSeq input with
  | [] -> ()
  | (key, value)::tail -> 
      yield (key, value)
      yield! loop key tail }

If you wanted a nicer and a bit more declarative solution, then you could use data frame and time series library that I've been working on at BlueMountain Capital (not yet officially announced, but should work).

// Series needs to have unique keys, so we add an index to your original keys
// (so we have series with (0, 1) => 0;  (1, 2) => 1; ...
let xs = series <| Seq.zip (Seq.zip [0..10] [1;2;2;3;3;3;4;1;1;2;2]) [0..10]

// Create chunks such that your part of the key is the same in each chunk
let chunks = xs |> Series.chunkWhile (fun (_, k1) (_, k2) -> k1 = k2)

// For each chunk, return the first element, or the first and the last
// element, if this is the last chunk (as you always want to include the last element)
chunks 
|> Series.map (fun (i, k) chunk -> 
    let f = Series.firstValue chunk
    let l = Series.lastValue chunk
    if (i, k) = Series.lastKey chunks then
      if f <> l then [k, f; k, l] else [k, l]
    else [k, f]) 
// Concatenate the produced values into a single sequence
|> Series.values |> Seq.concat

The chunking is the key operation that you need here (see the documentation). The only tricky thing is returning the last element - which could be handled in multiple different ways - not sure if the one I used is the nicest.

Upvotes: 3

John Palmer
John Palmer

Reputation: 25516

I think something like this will work

let remove dup = 
    dup
    |> Seq.pairwise
    |> Seq.filter (fun ((a,b),(c,d)) -> a <> c)
    |> Seq.map fst     

Upvotes: 1

Related Questions