Reputation: 21
I'm new the the Clojure world and to functional programming in general. I'm trying to write a function that computes the probability of a particular list of words occurring given a vocabulary (just a list of words) and a set of probabilities (the probabilities of each of those words occurring). I'm using a simplified bag-of-words model and each outcome is assumed to be independent.
For example, given:
(list 'the 'dog 'boat)
I want it to calculate (0.05) * (0.09) * (0.04) = 0.00018
I already have a function that fetches the probability of each individual word and it works as expected. I'll paste it here for reference:
(defn lookup-probability [w outcomes probs]
(if (not= w (first outcomes)) ;;if the current element is not equal to the word we're looking for...
(lookup-probability w (rest outcomes) (rest probs)) ;;...keep cycling through the vocabulary
(first probs) ;;once we find the right word, fetch the corresponding entry in the probability list
)
)
Here's the part that I'm confused about:
(def sentenceprobs '()) ;;STEP 1
(defn compute-BOW-prob [sentence vocabulary probabilities]
(if (not(empty? sentence))
(def sentenceprobs (conj sentenceprobs (lookup-probability (first sentence) vocabulary probabilities)) ;;STEP 2
(compute-BOW-prob (rest sentence) vocabulary probabilities) ;;STEP 3
)
(product sentenceprobs) ;;STEP 4 (the product function just multiplies all the elements of a list together)
)
)
Here's my general strategy:
This works fine if I only want to use the function once. However, if I want to call it multiple times, sentenceprobs still contains all the probabilities from the previous call. The function will still run, but it just gives me the wrong probability (something much much tinier). So I tried to reset the value of the sentenceprobs at the very end of my function to make it "reusable":
(def sentenceprobs '())
(defn compute-BOW-prob [sentence vocabulary probabilities]
(if (not(empty? sentence))
(def sentenceprobs (conj sentenceprobs (lookup-probability (first sentence) vocabulary probabilities))
(compute-BOW-prob (rest sentence) vocabulary probabilities)
)
(product sentenceprobs)
)
(def sentenceprobs '()) ;; <---THIS IS WHAT I ADDED
)
When I do this, the function doesn't return anything at all. In a sense, that is expected since the function has to return an operation on this list, so making it empty would probably mess that up. But I thought since I'm recursing and returning a value before we ever get out of the if-statement, this wouldn't be a problem. I guess I was mistaken haha.
I've done some poking around on the internet, and it seems like this isn't how def
works in Clojure, but I have no idea how to fix it. Does anyone know how I could make this work? Thanks so much.
Upvotes: 2
Views: 196
Reputation: 1618
Like you are mentioning this isn't the way to use def
. You try to create a list and then append things from multiple disconnected function calls. This is the way of imperative languages but not of functional languages. Here we rather create a list anonymously and pass it around as return values from functions.
I tried running your code, but compute-BOW-prob
didn't compile, so I'm not sure precisely how you expect it to work.
Anyway here's some improvement points for you.
In the first version I tried modifying as little as possible from your original design (def
had to go though).
In your design you tried returning the product only in the base case (empty sentence). That isn't a good design for recursive functions, they should always return the same type of value. In comp-bow
this is solved by the base case returning 1 and the continuously multiply the probabilities as you move up through the calling functions.
(defn comp-bow [sentence vocabulary probabilities]
(if (not (empty? sentence))
(* (comp-bow (rest sentence) vocabulary probabilities)
(lookup-probability (first sentence) vocabulary probabilities))
1))
I don't know if you're familiar with let
. That is the closes thing you get to assigning stuff in Clojure, the assigns only lives inside of the let
list.
This design is quite similar to your design as it creates the list of probabilities and only performs the multiplication in the end. (I use apply *
instead of your product
. Here I separate the function returning a list and the function returning the product (as mentioned above). The let
here is just for illustrative purpose.
(defn comp-bow2 [sentence vocabulary probabilities]
(apply * (comp-bow2-sub sentence vocabulary probabilities)))
(defn comp-bow2-sub [sentence vocabulary probabilities]
(if (not (empty? sentence))
(let [sentenceprobs (comp-bow2-sub (rest sentence) vocabulary probabilities)
word-prob (lookup-probability (first sentence) vocabulary probabilities)]
(conj sentenceprobs word-prob))))
If you are to do recursive function calls you should know about recur
. Since Clojure run on JVM you might get in trouble if you do to many recursive function calls. recur
avoids this by deleting the current stack frame when called, but to do that you can only put the recur
call last in the function so that the functions stack frame is unneeded when moving back up through calling functions.
This is a bit similar to my first suggestion but the difference is I start multiplying the probability with 1 directly so that recur
can be used.
(defn comp-bow3 [sentence vocabulary probabilities]
(comp-bow3-sub sentence vocabulary probabilities 1))
(defn comp-bow3-sub [sentence vocabulary probabilities product]
(if (empty? sentence)
product
(recur (rest sentence) vocabulary probabilities
(* product (lookup-probability (first sentence) vocabulary probabilities)))))
It isn't necessary yo use recursion to accomplish this (even though it's quite fun). A solution based on reduce
as suggested by mvarela is probably clearer for most.
Upvotes: 0
Reputation: 219
Just building a bit on Alan's response. In this case, you have a list of values (words in a sentence), and you want to calculate an aggregate (the probability of all those words happening together, according to some previous probability calculations). I'm assuming you have built your probability table like a map, as Alan did (though I'm using strings instead of keywords for the keys).
To perform the aggregation, we'll use reduce
, which allows you to collapse a collection into a single value. It does this by using a function that takes an accumulator, and a value, and applying it to all elements in the collection.
The code looks like this:
(def prob-map
{"sleep" 0.3
"dog" 0.09,
"a" 0.2,
"the" 0.05,
"cow" 0.17,
"boat" 0.04,
"everything" 0.15})
(defn compute-BOW-prob [probs sentence]
(reduce (fn [acc word]
(* acc (get probs word 1)))
1
(clojure.string/split sentence #"\s")))
(compute-BOW-prob prob-map "the dog boat")
;; => 1.7999999999999998E-4
It's essentially the same as Alan's solution, but it does not have a separate step for multiplying the probabilities (this also saves you an intermediate list, which is most likely not an issue in this case, but it might be if you have very large inputs).
The code above takes the probability map and a sentence as inputs. It then splits the sentence (I just used whitespace as the delimiter, but you can add punctuation and stop words as needed), and reduces over the list using the function provided. That function takes an accumulator (acc
) and an element of the list (word
), and multiplies the accumulator times the probability of that word (or 1, if the word isn't found... you could take different approaches on how to handle this, of course). The 1
below the function is the initial value that acc
will take.
Hops this helps clarify your ideas!
In general, you don't need to modify variables at all, and you should definitely not use def
at all inside functions. Also, try to avoid using globals explicitly, and have your functions take those as arguments.
Upvotes: 1
Reputation: 29976
Please see this template project. It shows how I like to organize a project (just clone that the repository and start coding!). Especially study the list of documentation to see the differences between Clojure and imperative-style languages.
I would approach this problem by using a map to hold the probabilities. You can then use mapv
(or just map
) to pull the probs out of the map into a vector (or list). Using (apply * ...)
then computes the product:
(ns tst.demo.core
(:use tupelo.test)
(:require
[tupelo.core :as t]))
(def prob-map
{:sleep 0.3
:dog 0.09,
:a 0.2,
:the 0.05,
:cow 0.17,
:boat 0.04,
:everything 0.15})
(defn calc-prob
[words]
(let [probs (mapv #(get prob-map %) words)]
(apply * probs)))
(dotest
(let [sentence [:the :dog :boat]
result (calc-prob sentence)
expected (t/spyx (* 0.05 0.09 0.04)) ; spyx displays the value
]
(is (t/rel= result expected :digits 8))))
You can run it via:
> lein clean
> lein test
to produce the output:
--------------------------------------
Clojure 1.10.2-alpha1 Java 15
--------------------------------------
Testing tst.demo.core
(* 0.05 0.09 0.04) => 1.7999999999999998E-4
Ran 2 tests containing 1 assertions.
0 failures, 0 errors.
Upvotes: 0