carbassot
carbassot

Reputation: 151

Is it possible to split a value by 2 delimiters in a rdd using pyspark?

I have tuples like this:

('id1', 'date;type;value\n2017-11-11 08:32:46.934;no_error;54.64325\n2017-11-11 08:32:47.356;no:error;35.46643\n')

And I want to split the value by ';' and '\n'. But I found I can't concatenate them and I don't know what to do instead. So far, I've got this:

rdd.mapValues(lambda t:  t.split(';'))

Is there any way I can split it by 2 delimiters?

Upvotes: 0

Views: 48

Answers (1)

akuiper
akuiper

Reputation: 214957

You can use re.split splitting on regex ;|\n:

import re
rdd.mapValues(lambda t:  re.split(';|\n', t))

Upvotes: 2

Related Questions