Rv R
Rv R

Reputation: 311

modify a string column and replace substring pypsark

I have a pyspark dataframe with a Name column with sample values as follows:

 id   NAME
 ---+-------
  1  aaa bb c
  2  xx yy z
  3  abc def 
  4  qw er
  5  jon lee ls G

I have to flip the right most part and populate it on the left side with comma and delete the right most substring(split using the space)

Expected output

id   NAME
---+-------
 1   c, aaa bb
 2   z, xx yy
 3   def, abc
 4   er, qw
 5   G, jon lee ls 

I was able to get the right most part to add it with comma by using below code: split_col=split(df['NAME'],' ') df2 = df2.withColumn('NAME_RIGHT',split_col.getItem(F.size(split_col) - 1))

the above line gives

NAME_RIGHT
  c
  z
  def 
  er

I want to replace the values in NAME_RIGHT i.e. the right most values from the NAME column, I tried using below code but it replaces nothing, how can this be achieved?

df3 = df2.withColumn('NEW_NAME', regexp_replace(F.col("NAME"), str(df2.NAME_RIGHT),""))

Upvotes: 0

Views: 648

Answers (1)

pltc
pltc

Reputation: 6082

Regex would be a bit cumbersome, I'd suggest to use split and concat instead.

from pyspark.sql import functions as F

(df
    .withColumn('n1', F.split('name', ' '))
    .withColumn('n2', F.reverse('n1')[0])
    .withColumn('n3', F.concat_ws(' ', F.array_except('n1', F.array('n2'))))
    .withColumn('n4', F.concat_ws(', ', F.array('n2', 'n3')))
    .show()
)

# +---+------------+-----------------+---+----------+-------------+
# | id|        name|               n1| n2|        n3|           n4|
# +---+------------+-----------------+---+----------+-------------+
# |  1|    aaa bb c|     [aaa, bb, c]|  c|    aaa bb|    c, aaa bb|
# |  2|     xx yy z|      [xx, yy, z]|  z|     xx yy|     z, xx yy|
# |  3|    abc def |     [abc, def, ]|   |   abc def|    , abc def|
# |  4|       qw er|         [qw, er]| er|        qw|       er, qw|
# |  5|jon lee ls G|[jon, lee, ls, G]|  G|jon lee ls|G, jon lee ls|
# +---+------------+-----------------+---+----------+-------------+

Upvotes: 1

Related Questions