Reputation: 123
So I've gone through all the examples on here of replacing special characters from column names, but I can't seem to get it to work for periods.
What I've tried:
# works to remove spaces
df.select([F.col(c).alias(c.replace(' ', '_')) for c in df.columns])
# doesn't work to remove periods
df.select([F.col(c).alias(c.replace('.', '')) for c in df.columns])
# removes special characters except periods
df.select([F.col(col).alias(re.sub("[^0-9a-zA-Z$]+","",col)) for col in df.columns])
I know how to change the name of a column by referencing that specific column, but this needs to change names of columns for any dataframe with columns with special characters
Specifically here is the column name that is giving me trouble: "Src. of Business Contact Full Name"
Upvotes: 2
Views: 2826
Reputation: 31226
select()
is a deprecated method. Why not make as simple as below?
import re
df = pd.DataFrame(["a biz"], columns=["Src.$ of-Business Contact` Full Name"])
df.columns = [re.sub("[ ,-]", "_", re.sub("[\.,`,\$]", "", c)) for c in df.columns]
df
output
Src_of_Business_Contact_Full_Name
0 a biz
Upvotes: 1
Reputation: 8410
Another way to go about this using reduce
and withColumnRenamed
.
from functools import reduce
(reduce(lambda new_df, col: new_df.withColumnRenamed(col,col.replace('.','')),df.columns,df)).show()
Upvotes: 0
Reputation: 31510
Try by escaping column names using backquotes `col_name`
.
df=spark.createDataFrame([('1','2')],['header','pla.nned'])
df.columns
#['header', 'pla.nned']
from pyspark.sql import functions as F
df.select([F.col("`{0}`".format(c)).alias(c.replace('.', '')) for c in df.columns]).show()
#+------+-------+
#|header|planned|
#+------+-------+
#| 1| 2|
#+------+-------+
Upvotes: 1