user3897533
user3897533

Reputation: 477

How to do SELECT with column name that appears twice in DataFrame?

I have the following code..

DataFrame addressDF = sqlContext.read().parquet(addressParquetPath);
DataFrame propertyDF = sqlContext.read().parquet(propertyParquetPath);

DataFrame joinedFrame = addressDF.join(propertyDF, propertyDF.col("LOCID").equalTo(addressDF.col("locid")), "left");

joinedFrame.registerTempTable("joinedFrame");
DataFrame joinedFrameSelect = sqlContext.sql("SELECT LOCID,AddressID FROM joinedFrame");

in the Select LocID is listed twice, how do i pick the LocId of Address instead of property.

Can i execute select on the dataframe by column index?

Upvotes: 1

Views: 1307

Answers (1)

David Griffin
David Griffin

Reputation: 13927

I usually rename the column -- you can either try:

...join(propertyDF.withColumnRenamed("LocID", "LocID_R"), ...

Or if you want to change all of the column names for a DataFrame in one go -- such as add an _R for "right" to every name -- you can try this:

df.toDF(df.columns.map(_ + "_R"):_*)

This is useful when you are joining a DataFrame back onto itself.

Upvotes: 4

Related Questions