Reputation: 73
Not quite sure where to start with this. I have a big file of data which contains different values which all related to a certain thing (i.e data in column 1 would be the hour) the file is 15 columns wide. The file does not contain any column headings though, it is all just numeric data.
I need to read this data into a data type such as a hash map which would allow me to sort through it and query the data using things such as contains? as well as perform calculations.
I am unsure of how to do this as I am new to Clojure, any help would be appreciated.
My file is a txt file (saved as mydata.txt) and structured like so:
1 23 25 -9 -0 1 1
2 23 25 10 1 2 3
My code so far is:
(def filetoanalyse (slurp "mydata.txt"))
(zipmap [:num1 :num2 :num3 :num4 :num5 :num6 :num7] filetoanalyse)
It seems to associated the whole of the file with :num1 at current.
Upvotes: 3
Views: 511
Reputation: 50017
Here's a function you can use to do what you're looking for:
(defn map-from-file [field-re column-names filename]
(let [ lines (re-seq #"[^\r\n]+" (slurp filename)) ]
(map #(zipmap column-names (re-seq field-re %)) lines)))
You have to supply three arguments:
A regular expression to separate the fields in each row. For the data you've shown this can be #"[^ ]+"
, or basically anything which isn't a blank is part of the field. If you've got simple comma-separated values with no complications such as embedded commas in the data or quoted field something like #"[^,]+"
will work. Or if you want to only extract numeric characters something a bit more complex such as `#"[-0-9]+" will work.
A collection of column names to assign.
The name of the file.
So if the data you show in your question is stored as test3.dat
somewhere you could invoke the above function as
(map-from-file #"[^ ]+" [:c1 :c2 :c3 :c4 :c5 :c6 :c7] "/some-path/test3.dat")
and it would return
({:c1 "1", :c2 "23", :c3 "25", :c4 "-9", :c5 "-0", :c6 "1", :c7 "1"} {:c1 "2", :c2 "23", :c3 "25", :c4 "10", :c5 "1", :c6 "2", :c7 "3"})
or in other words you get back a sequence of maps which map the values by the column names you've supplied. If you prefer to have the data in a vector you can use
(into [] (map-from-file #"[^ ]+" [:c1 :c2 :c3 :c4 :c5 :c6 :c7] "/some-path/test3.dat"))
Upvotes: 2
Reputation: 4901
Main answer
Slurp will return the file contents as a text string, but your code seems to assume that this file has already been parsed into an array of numbers. That is not the case. You can still use slurp
but you will have to parse the file yourself. You can parse it by first splitting the file string by line separator using split-lines. Each line is valid Clojure syntax for a vector if we surround it by square brackets, and if we do so, we can then parse it into a vector using edn/read-string. We use map
to parse each line of the file. The following code will do the job, and uses the ->> macro to keep the code readable:
(require '[clojure.string :as cljstr])
(require '[clojure.edn :as edn])
(->> "/tmp/mydata.txt"
slurp
cljstr/split-lines
(map #(zipmap
[:num1 :num2 :num3 :num4 :num5 :num6 :num7]
(edn/read-string (str "[" % "]")))))
;; => ({:num1 1, :num2 23, :num3 25, :num4 -9, :num5 0, :num6 1, :num7 1} {:num1 2, :num2 23, :num3 25, :num4 10, :num5 1, :num6 2, :num7 3})
Extensions/variations
In case there are lines with other number of elements, you may want to keep only those with seven elements, using filter
. Mapping and filtering can be composed into a transducer that we pass as argument to into:
(let [columns [:num1 :num2 :num3 :num4 :num5 :num6 :num7]
n (count columns)]
(->> "/tmp/mydata.txt"
slurp
cljstr/split-lines
(into [] (comp (map #(zipmap
columns
(edn/read-string (str "[" % "]"))))
(filter #(= n (count %)))))))
;; => [{:num1 1, :num2 23, :num3 25, :num4 -9, :num5 0, :num6 1, :num7 1} {:num1 2, :num2 23, :num3 25, :num4 10, :num5 1, :num6 2, :num7 3}]
If you expect to parse more complicated files or really wanted to kill/overengineer it, you could use spec:
(require '[clojure.spec.alpha :as spec])
(->> "/tmp/mydata.txt"
slurp
cljstr/split-lines
(map #(edn/read-string (str "[" % "]")))
(spec/conform (spec/coll-of (spec/cat :num1 number?
:num2 number?
:num3 number?
:num4 number?
:num5 number?
:num6 number?
:num7 number?))))
;; => ({:num1 1, :num2 23, :num3 25, :num4 -9, :num5 0, :num6 1, :num7 1} {:num1 2, :num2 23, :num3 25, :num4 10, :num5 1, :num6 2, :num7 3})
Upvotes: 2
Reputation: 3212
The problem you're running into is that slurp
reads in the file as a string. When you use zipmap
on it, it uses characters from the string as the values in the map, leading to this mess:
(zipmap [:num1 :num2 :num3 :num4 :num5 :num6 :num7] (slurp "mydata.txt"))
;;=> {:num1 \space,
:num2 \space,
:num3 \1,
:num4 \space,
:num5 \2,
:num6 \3,
:num7 \space}
The easiest approach is to iterate over the file line by line, splitting it into the values that you want. Note the vec
here, which forces the result of the (lazy) for
, ensuring that we process the entire file before with-open
closes the reader.
(with-open [reader (clojure.java.io/reader "mydata.txt")]
(vec (for [line (line-seq reader)] ; iterate over each line
(->> (clojure.string/split line #"\s+") ; split it by whitespace
(remove empty?) ; remove any empty entries
(map #(Long/parseLong %)) ; convert into Longs (change if another format is more suitable)
(zipmap [:num1 :num2 :num3 :num4 :num5 :num6 :num7]))))) ; turn into a map
Upvotes: 3