Reputation: 701
For example here is my Existing header
DataPartition|^|TimeStamp|^|Source.organizationId|^|Source.sourceId|^|FilingDateTime|^|SourceTypeCode|^|DocumentId|^|Dcn|^|DocFormat|^|StatementDate|^|IsFilingDateTimeEstimated|^|ContainsPreliminaryData|^|CapitalChangeAdjustmentDate|^|CumulativeAdjustmentFactor|^|ContainsRestatement|^|FilingDateTimeUTCOffset|^|ThirdPartySourceCode|^|ThirdPartySourcePriority|^|SourceTypeId|^|ThirdPartySourceCodeId|^|FFAction|!|
I want to create header like below
DataPartition_1|^|TimeStamp|^|Source.organizationId|^|Source.sourceId|^|FilingDateTime_1|^|SourceTypeCode_1|^|DocumentId_1|^|Dcn_1|^|DocFormat_1|^|StatementDate_1|^|IsFilingDateTimeEstimated_1|^|ContainsPreliminaryData_1|^|CapitalChangeAdjustmentDate_1|^|CumulativeAdjustmentFactor_1|^|ContainsRestatement_1|^|FilingDateTimeUTCOffset_1|^|ThirdPartySourceCode_1|^|ThirdPartySourcePriority_1|^|SourceTypeId_1|^|ThirdPartySourceCodeId_1|^|FFAction_1
Except for columns TimeStamp|^|Source.organizationId|^|Source.sourceId
I want to append _1 in all header columns
I have done it by using with withColumn
but using this I have to do for all columns .
Is there any easy way to do it like using foldLeft
?
Upvotes: 1
Views: 149
Reputation: 7928
First, you need to define a list of the columns you want to skip:
val columnsToAvoid = List("TimeStamp","Source.organizationId","Source.sourceId")
Then you can foldLeft
over the column list of the dataFrame (given by df.columns
) renaming each column that it's not contained in the columnsToAvoid list and returning the unchanged dataFrame otherwise.
df.columns.foldLeft(df)((acc, elem) =>
if (columnsToAvoid.contains(elem)) acc
else acc.withColumnRenamed(elem, elem+"_1"))
A quick example here:
Original DF
+-----+------+-----------+
| word| value| TimeStamp|
+-----+------+-----------+
|wordA|valueA|45435345435|
|wordB|valueB| 454244345|
|wordC|valueC|32425425435|
+-----+------+-----------+
Operation:
df.columns.foldLeft(df)((acc, elem) => if (columnsToAvoid.contains(elem)) acc else acc.withColumnRenamed(elem, elem+"_1")).show
Result:
+------+-------+-----------+
|word_1|value_1| TimeStamp|
+------+-------+-----------+
| wordA| valueA|45435345435|
| wordB| valueB| 454244345|
| wordC| valueC|32425425435|
+------+-------+-----------+
Upvotes: 1