Reputation: 37
How to convert flatMap of a text file to flatMap of characters? I have to count of occurrences of each character from a text file. What approach to take after following code?
val words = readme.flatMap(line => line.split(" ")).collect()
Upvotes: 2
Views: 2092
Reputation: 41
val txt = a.getClass.getResourceAsStream("/a.txt")
val txtFile = File.createTempFile("a", "txt")
txtFile.deleteOnExit()
ByteStreams.copy(txt, Files.newOutputStreamSupplier(txtFile))
val tokenized = sc.textFile(txtFile.toString).flatMap(_.split(' '))
val char = tokenized.flatMap(_.toCharArray)
Upvotes: 0
Reputation: 13985
If you are only interested in char
's then I think you probably want to count spaces ' '
too
val chars = readme.flatMap(line => line.toCharArray)
// but if you dont want to count spaces too,
// val chars = readme.flatMap(line => line.toCharArray.filter(_ != ' '))
val charsCount = chars
.map(c => (c, 1))
.reduceByKey((i1: Int, i2: Int) => i1 + i2)
Upvotes: 0
Reputation: 149568
In order to convert each String
into its representing characters, you need an additional flatMap
:
val characters = lines.flatMap(_.split(" ")).flatMap(_.toCharArray)
scala> val lines = Array("hello world", "yay more lines")
lines: Array[String] = Array(hello world, yay more lines)
scala> lines.flatMap(_.split(" ")).flatMap(_.toCharArray)
res3: Array[Char] = Array(h, e, l, l, o, w, o, r, l, d, y, a, y, m, o, r, e, l, i, n, e, s)
Although this is a Scala console, it will work the same on an RDD
.
Upvotes: 1