Reputation: 477
I have the following code..
DataFrame addressDF = sqlContext.read().parquet(addressParquetPath);
DataFrame propertyDF = sqlContext.read().parquet(propertyParquetPath);
DataFrame joinedFrame = addressDF.join(propertyDF, propertyDF.col("LOCID").equalTo(addressDF.col("locid")), "left");
joinedFrame.registerTempTable("joinedFrame");
DataFrame joinedFrameSelect = sqlContext.sql("SELECT LOCID,AddressID FROM joinedFrame");
in the Select LocID is listed twice, how do i pick the LocId of Address instead of property.
Can i execute select on the dataframe by column index?
Upvotes: 1
Views: 1307
Reputation: 13927
I usually rename the column -- you can either try:
...join(propertyDF.withColumnRenamed("LocID", "LocID_R"), ...
Or if you want to change all of the column names for a DataFrame
in one go -- such as add an _R
for "right" to every name -- you can try this:
df.toDF(df.columns.map(_ + "_R"):_*)
This is useful when you are joining a DataFrame
back onto itself.
Upvotes: 4