how to concat values of columns in pyspark

Question

I have a dataframe. A column "names" contains columns headers, which values should be concated. I want to do it with pyspark concat_ws() but nothing works. I must solve it with concat_ws() function, no pandas etc.

The best what I got it was concated headers, but not values in this columns. I couldn't return list from function to unpack it in concat_ws()

map_cols = {'a':'newA', 'b':'newB', 'c':'newC', 'd':'newD'}

@udf
def get_result(names_col):
    headers = []
    for i in names_col:
        headers.append(map_cols[i])
    return headers

df = df.withColumn('names_arr', split('names', '_')).withColumn('result', concat_ws(';', get_result(col('names_arr'))))

Input dataframe:

names   | newA|newB|newC|newD
---------------------------
a_b     |1    | 2  | 7  |8
---------------------------
a_b_c   |2    | 3  | 4  |4
---------------------------
a_b_c_d |3    | 2  |4   |4
---------------------------
c_d     | 89  |  5 |3   |5
---------------------------
b_c_d   |  7  |5   |6   | 5


Expected output dataframe

names   | newA|newB|newC|newD|result
--------------------------------------
a_b     |1    | 2  | 7  | 8  |1;2
--------------------------------------
a_b_c   |2    | 3  | 4  |4   |2;3;4
--------------------------------------
a_b_c_d |3    | 2  |4   |4   |2;3;4;4
--------------------------------------
c_d     |89   |  5 |3   |5   |3;5
--------------------------------------
b_c_d   |7    |5   |6   | 5  |5;6;5

how to concat values of columns in pyspark

Answers (1)

Related Questions