PySpark - create column based on column names referenced in another column

Question

I am quite new to spark and can't get it to work... Hopefully, there is an easy way of doing this... What I am trying to do is best described by the following table: (I need to get the "required" column)

    colA    colB    colC    ref     required
1   a1        b1    c1      colA     a1
2   a2        b2    c2      colA     a2
3   a3        b3    c3      colB     b3
4   a4        b4    c4      colB     b4
5   a5        b5    c5      colC     c5
6   a6        b6    c6      colC     c6

The above is just an example - in the real example I have >50 columns, so doing conditions is not going to work...

I know this can be easily done in pandas using something like:

df['required'] = df.apply(lambda x: x.loc[x.ref], axis=1)

or

df['required'] = df.lookup(df.index, df.ref)

Any suggestions how to do this in PySpark?

PySpark - create column based on column names referenced in another column

Answers (1)

Related Questions