Reputation: 900
I use spark 2.3.2 and I want to aggregate 2 columns but the .agg()
function tells me that there is a problem with the column names but I don't see it.
some peudo code with the actual column names:
df = spark.read.parquet('./my_files')
[... doing some stuff with the data everything works fine ...]
df2 = df.groupBy(AD_ID).agg({'pagerank':'sum','pagerankRAW':'sum'})
when I do that spark throws me the exception:
AnalysisException: 'Attribute name "sum(pagerankRAW)" contains invalid character(s) among " ,;{}()\\n\\t=". Please use alias to rename it.;'
but I don't see the invalid characters .... there are only letters in my column name. When I delete 'pagerankRAW':'sum'
from the dict
I get the same error but this time for sum(pagerank)
so what do I do wrong?
Upvotes: 0
Views: 646
Reputation: 56
It looks like a weird one, pyspark should be able to handle parenthesis
I use a different syntax when I use agg()
though.
I'd use .agg(sum("pagerank"), sum("pagerankRAW"))
and I don't get this error
I don't think you can use alias()
with your syntax because I don't see where to place it
With alias .agg(sum("pagerank").alias("pagerank"), sum("pagerankRAW").alias("pagerankRAW))
Upvotes: 1