Govind Yadav
Govind Yadav

Reputation: 37

Count number of character occurrences from input text file

How to convert flatMap of a text file to flatMap of characters? I have to count of occurrences of each character from a text file. What approach to take after following code?

val words = readme.flatMap(line => line.split(" ")).collect()

Upvotes: 2

Views: 2092

Answers (3)

muyexm329
muyexm329

Reputation: 41

val txt = a.getClass.getResourceAsStream("/a.txt")
val txtFile = File.createTempFile("a", "txt")
txtFile.deleteOnExit()
ByteStreams.copy(txt, Files.newOutputStreamSupplier(txtFile))
val tokenized = sc.textFile(txtFile.toString).flatMap(_.split(' ')) 
val char = tokenized.flatMap(_.toCharArray)

Upvotes: 0

sarveshseri
sarveshseri

Reputation: 13985

If you are only interested in char's then I think you probably want to count spaces ' ' too

val chars = readme.flatMap(line => line.toCharArray)

// but if you dont want to count spaces too,
// val chars = readme.flatMap(line => line.toCharArray.filter(_ != ' '))

val charsCount = chars
  .map(c => (c, 1))
  .reduceByKey((i1: Int, i2: Int) => i1 + i2)

Upvotes: 0

Yuval Itzchakov
Yuval Itzchakov

Reputation: 149568

In order to convert each String into its representing characters, you need an additional flatMap:

val characters = lines.flatMap(_.split(" ")).flatMap(_.toCharArray)

scala> val lines = Array("hello world", "yay more lines")
lines: Array[String] = Array(hello world, yay more lines)

scala> lines.flatMap(_.split(" ")).flatMap(_.toCharArray)
res3: Array[Char] = Array(h, e, l, l, o, w, o, r, l, d, y, a, y, m, o, r, e, l, i, n, e, s)

Although this is a Scala console, it will work the same on an RDD.

Upvotes: 1

Related Questions