Solaiman Mansyur
Solaiman Mansyur

Reputation: 63

Clojure read CSV and split the columns into several vectors

Currently i have functions like this:

(def csv-file (.getFile  (clojure.java.io/resource "datasources.csv")))

(defn process-csv [file]
  (with-open [in-file (io/reader file)]
    (doall (csv/read-csv in-file))))

what i need to do now is to produce vectors based on / group by columns from csv, i.e my process-csv output looks like this:

(["atom" "neutron" "photon"] 
[10 22 3] 
[23 23 67])

my goal is to generate 3 vectors from column atom, neutron & photon:

atom: [10 23]
neutron: [22 23]
photon: [3 67]

FYI, i define 3 empty vectors before read the csv file:

(def atom [])
(def neutron[])
(def photon[])

Upvotes: 1

Views: 1291

Answers (2)

leetwinski
leetwinski

Reputation: 17859

first of all you can't modify these vectors, you've defined. It's the nature of immutable data structures. If you really need mutable vectors, use atom.

you can solve your task this way:

user> (def items (rest '(["atom" "neutron" "photon"] 
                         [10 22 3] 
                         [23 23 67]
                         [1 2 3]
                         [5 6 7])))

user> (let [[atom neutron photon] (apply map vector items)]
        {:atom atom :neutron neutron :photon photon})
{:atom [10 23 1 5], :neutron [22 23 2 6], :photon [3 67 3 7]}

that is how it work: (apply map vector items) equals the following:

(map vector [10 22 3] [23 23 67] [1 2 3] [5 6 7])

it takes first items of each coll and make a vector of them, then second items and so on.

also, you can make it more robust, by taking row column names exactly from your csv data header:

user> (def items '(["atom" "neutron" "photon"] 
                   [10 22 3] 
                   [23 23 67]
                   [1 2 3]
                   [5 6 7]))
#'user/items

user> (zipmap (map keyword (first items))
              (apply map vector (rest items)))
{:atom [10 23 1 5], :neutron [22 23 2 6], :photon [3 67 3 7]}

Upvotes: 6

Mars
Mars

Reputation: 8854

I'll illustrate some other methods you could use, which can be combined with methods that leetwinski illustrates. Like leetwinski, I'll suggest using a hash map as your final structure, rather than three symbols containing vectors. That's up to you.

If you want, you can use core.matrix's transpose to do what leetwinski does with (apply map vector ...):

(require '[clojure.core.matrix :as mx])
(mx/transpose '(["atom" "neutron" "photon"] [10 22 3] [23 23 67]))

which produces:

[["atom" 10 23] ["neutron" 22 23] ["photon" 3 67]]

transpose is designed to work on any kind of matrix that implements the core.matrix protocols, and normal Clojure sequences of sequences are treated as matrices by core.matrix.

To generate a map, here's one approach:

(into {} (map #(vector (keyword (first %)) (rest %))
              (mx/transpose '(["atom" "neutron" "photon"] [10 22 3] [23 23 67]))))

which produces:

{:atom (10 23), :neutron (22 23), :photon (3 67)}

keyword makes strings into keywords. #(vector ...) makes a pair, and (into {} ...) takes the sequence of pairs and makes a hash map from them.

Or if you want the vectors in vars, as you specified, then you can use a variant of leetwinski's let method. I suggest not defing the symbol atom, because that's the name of a standard function in Clojure.

(let [[adam neutron proton] (mx/transpose 
                              (rest '(["atom" "neutron" "photon"]
                                      [10 22 3]
                                      [23 23 67])))]
  (def adam adam)
  (def neutron neutron)
  (def proton proton))

It's not exactly good form to use def inside a let, but you can do it. Also, I don't recommend naming the local variables defined by let with the same names as the top-level variables. As you can see, if makes the defs confusing. I did this on purpose here just to show how the scoping rule works: In (def adam adam), the first instance of "adam" represents the top-level variable that gets defined, whereas the second instance of "adam" represents the local var defined by let, containing [10 23]. The result is:

  adam ;=> [10 23]
  neutron ;=> [22 23]
  proton ;=> [3 67]

(I think there are probably some subtleties that I'm expressing incorrectly. If so, someone will no doubt comment about it.)

Upvotes: 1

Related Questions