Reputation: 1035
I'm currently trying to rename the elements in my pyspark dataframe. The dataframe df looks like this:
+--------+------+------+
| hello| this|column|
+--------+------+------+
| 132| 234| abc|
|34563465|134134| def|
| 12| 34| ghi|
| 132| 234| jkl|
|34563465|134134| mno|
| 12| 34| pqr|
| 132| 234| stu|
|34563465|134134| ghi|
| 12| 34| pqr|
+--------+------+------+
What I am trying to do is rename every element in the 'column' column along the lines of this:
df['column'] = df['column'].map({'abc': 'cba',
'def': 'fed',
'ghi': 'ihg',
'jkl': 'lkj',
'mno': 'onm',
'pqr': 'rqp',
'stu': 'uts'})
So that the dataframe will then look like this:
+--------+------+------+
| hello| this|column|
+--------+------+------+
| 132| 234| cba|
|34563465|134134| fed|
| 12| 34| ihg|
| 132| 234| lkj|
|34563465|134134| onm|
| 12| 34| rqp|
| 132| 234| uts|
|34563465|134134| ihg|
| 12| 34| rqp|
+--------+------+------+
How can I do this change in pyspark?
Upvotes: 0
Views: 340
Reputation: 1549
You can do it with the replace
function:
mapping = {
'abc': 'cba',
'def': 'fed',
'ghi': 'ihg',
'jkl': 'lkj',
'mno': 'onm',
'pqr': 'rqp',
'stu': 'uts'
}
df = df.replace(to_replace=mapping, subset=['column'])
Upvotes: 1