Liu Junhan
Liu Junhan

Reputation: 213

Calculating the difference between two strings

I have two strings

a :: [String]
a = ["A1","A2","B3","C3"]

and

b :: [String]
b = ["A1","B2","B3","D5"]

And I want to calculate the difference between two strings based on the first character and second character and combination of two characters. If the combination of two elements are the same, it would be calculate as 1

The function I declared is

calcP :: [String] -> [String] -> (Int,[String])
calcP (x:xs) (y:ys) = (a,b)
  where
    a = 0 in
      ???
    b = ????

I know that I should have a increment variable to count the correct element, and where I should put it in? For now I totally have no idea about how to do that, can anyone give me some hint??

The desired result would be

(2,["B2","D5"])

How should I do that?

Upvotes: 2

Views: 2448

Answers (3)

jferard
jferard

Reputation: 8180

I assume that the lists have the same size.

The differences between the two lists

Let's focus on the main part of the problem:

Prelude> a=["A1","A2","B3","C3"]
Prelude> b=["A1","B2","B3","D5"]

First, notice that the zip method zips two lists. If you use it on a and b, you get:

Prelude> zip a b
[("A1","A1"),("A2","B2"),("B3","B3"),("C3","D5")]

Ok. It's now time to compare the terms one to one. There are many ways to do it.

Filter

Prelude> filter(\(x,y)->x/=y)(zip a b)
[("A2","B2"),("C3","D5")]

The lambda function returns True if the elements of the pair are different (/= operator). Thus, the filter keeps only the pairs that don't match. It's ok, but you have to do a little more job to keep only the second element of each pair.

Prelude> map(snd)(filter(\(x,y)->x/=y)(zip a b))
["B2","D5"]

map(snd) applies snd, which keeps only the second element of a pair, to every discordant pair.

Fold

A fold is more generic, and may be used to implement a filter. Let's see how:

Prelude> foldl(\l(x,y)->if x==y then l else l++[y])[](zip a b)
["B2","D5"]

The lambda function takes every pair (x,y) and compares the two elements. If they have the same value, the accumulator list remains the identical, but if the values are different, the accumulator list is augmented by the second element.

List comprehension

This is more compact, and should seem obvious to every Python programmer:

Prelude> [y|(x,y)<-zip a b, x/=y] -- in Python [y for (x,y) in zip(a,b) if x!= y]
["B2","D5"]

The number of elements

You want a pair with the number of elements and the elements themselves.

Fold

With a fold, it's easy but cumbersome: you will use a slightly more complicated accumulator, that stores simultaneously the differences (l) and the number of those differences (n).

Prelude> foldl(\(n,l)(x,y)->if x==y then (n,l) else (n+1,l++[y]))(0,[])$zip a b
(2,["B2","D5"])

Lambda

But you can use the fact that your output is redundant: you want a list preceeded by the length of that list. Why not apply a lambda that does the job?

Prelude> (\x->(length x,x))[1,2,3]
(3,[1,2,3])

With a list comprehension, it gives:

Prelude> (\x->(length x,x))[y|(x,y)<-zip a b, x/=y]
(2,["B2","D5"])

Bind operator

Finally, and for the fun, you don't need to build the lambda this way. You could do:

Prelude> ((,)=<<length)[y|(x,y)<-zip a b,x/=y]
(2,["B2","D5"])

What happens here? (,) is a operator that makes a pair from two elements:

Prelude> (,) 1 2
(1,2)

and ((,)=<<length) : 1. takes a list (technically a Foldable) and passes it to the length function; 2. the list and the length are then passed by =<< (the "bind" operator) to the (,) operator, hence the expected result.

Partial conclusion

  • "There is more than than one way to do it" (but it's not Perl!)
  • Haskell offers a lot of builtins functions and operators to handle this kind of basic manipulation.

Upvotes: 2

Daniel Wagner
Daniel Wagner

Reputation: 152707

I'd like to propose a very different method than the other folks: namely, compute a "summary statistic" for each pairing of elements between the two lists, and then combine the summaries into your desired result.

First some imports.

import Data.Monoid
import Data.Foldable

For us, the summary statistic is how many matches there are, together with the list of mismatches from the second argument:

type Statistic = (Sum Int, [String])

I've used Sum Int instead of Int to specify how statistics should be combined. (Other options here include Product Int, which would multiply together the values instead of adding them.) We can compute the summary of a single pairing quite simply:

summary :: String -> String -> Statistic
summary a b | a == b    = (1, [ ])
            | otherwise = (0, [b])

Combining the summaries for all the elements is just a fold:

calcP :: [String] -> [String] -> Statistic
calcP as bs = fold (zipWith summary as bs)

In ghci:

> calcP ["A1", "A2", "B3", "C3"] ["A1", "B2", "B3", "D5"]
(Sum {getSum = 2},["B2","D5"])

This general pattern (of processing elements one at a time into a Monoidal type) is frequently useful, and spotting where it's applicable can greatly simplify your code.

Upvotes: 0

Igor Drozdov
Igor Drozdov

Reputation: 15045

What about doing it recursively? If two elements are the same, the first element of the resulting tuple is incremented; otherwise, the second element of the resulting tuple is appended by the mismatched element:

calcP :: [String] -> [String] -> (Int,[String])
calcP (x:xs) (y:ys)
  | x == y = increment (calcP xs ys)
  | otherwise = append y (calcP xs ys)
  where
    increment (count, results) = (count + 1, results)
    append y (count, results) = (count, y:results)

calcP [] x = (0, x)
calcP x [] = (0, [])

a = ["A1","A2","B3","C3"]
b = ["A1","B2","B3","D5"]

main = print $ calcP a b

The printed result is (2,["B2","D5"])

Note, that

calcP [] x = (0, x)
calcP x [] = (0, [])

are needed to provide exhaustiveness for the pattern matching. In other words, you need to provide the case when one of the passed elements is an empty list. This also provides the following logic:

If the first list is greater than the second one on n elements, these n last elements are ignored.

If the second list is greater than the first one on n elements, these n last elements are appended to the second element of the resulting tuple.

Upvotes: 1

Related Questions