Reputation: 5764
My initial RDD is a list of blocks in which each block is a list of lines itself. So it's
[infos_var1, infos_var2]
and each block is
var_name, var_value1, var_value2, var_value3
The original data looks like this:
[[u'::852-YF-007\t',
u'2016-05-10 00:00:00\t0',
u'2016-05-09 23:59:00\t0',
u'2016-05-09 23:42:00\t0'],
[u'::852-YF-008\t',
u'2016-05-10 00:00:00\t0',
u'2016-05-09 23:59:00\t0',
u'2016-05-09 23:42:00\t0']]
My question is how to use a map-function to extract the variable name (852-YF-007 and 852-YF-008) as key and as value the lines with the timestamp (here: 3 lines for each variable?
Maybe someone can give me a hint how to use map on my RDD. I was thinking of something like this:
df.map(lambda (k, v): (v[0], v[0-vEND]))
PS: The original post on how I created my initial RDD can be found here.
Upvotes: 2
Views: 1648