cacao
cacao

Reputation: 35

how to use split in spark scala?

I have a input file something looks like:

1: 3 5 7

3: 6 9

2: 5

......

I hope to get two list the first list is made up of numbers before ":", the second list is made up of numbers after ":". the two lists in the above example are:

1 3 2

3 5 7 6 9 5

I just write code as following:

 val rdd = sc.textFile("input.txt");

 val s = rdd.map(_.split(":"));

But do not know how to implement following things. Thanks.

Upvotes: 0

Views: 3377

Answers (1)

Katya Willard
Katya Willard

Reputation: 2182

I would use flatmaps! So,

val rdd = sc.textFile("input.txt")
val s = rdd.map(_.split(": ")) # I recommend adding a space after the colon
val before_colon = s.map(x => x(0))
val after_colon = s.flatMap(x => x(1).split(" "))

Now you have two RDDs, one with the items from before the colon, and one with the items after the colon!

If it is possible for your the part of the text before the colon to have multiple numbers, such as an example like 1 2 3: 4 5 6, I would write val before_colon = s.flatMap(x => x(0).split(" "))

Upvotes: 4

Related Questions