AWS Comprehend and PySpark - F.when Not Working

Question

I have rows of sentences in different languages and the language code is in a separate column. I am specifying to only process certain languages (en, es, fr, or de) since I know AWS Comprehend does not support 'nl' (Dutch). For some reason I continue to get an error that 'nl' is not supported even though it is not listed in my when condition and should therefore not be getting sent through the Comprehend udf. Any ideas on what might be wrong?

Here is my code:

import pyspark.sql.functions as F

def detect_sentiment(text,language):
    comprehend = boto3.client(service_name='comprehend', region_name='us-west-2')
    sentiment_analysis = comprehend.detect_sentiment(Text=text, LanguageCode=language)
    return sentiment_analysis


detect_sentiment_udf = F.udf(detect_sentiment)

reviews_4 = reviews_3.withColumn('RAW_SENTIMENT_SCORE', \
        F.when( (F.col('LANGUAGE')=='en') | (F.col('LANGUAGE')=='es') | (F.col('LANGUAGE')=='fr') | (F.col('LANGUAGE')=='de') , \
               detect_sentiment_udf('SENTENCE', 'LANGUAGE')).otherwise(None) )

reviews_4.show(50)

I get this error:

botocore.exceptions.ClientError: An error occurred (ValidationException) when calling the DetectSentiment operation: 
Value 'nl' at 'languageCode'failed to satisfy constraint: Member must satisfy enum value set: [ar, hi, ko, zh-TW, ja, zh, de, pt, en, it, fr, es]

AWS Comprehend and PySpark - F.when Not Working

Answers (1)

Related Questions