Liu Yu
Liu Yu

Reputation: 391

PySpark, add a tag column in union dataframe with condition

I have a union dataframe from two sources. I want to add a flag column based on 'ID' and 'Source', The rule:

  1. If the rating is Null and from only one Source and one ID, flag N
  2. If the rating is not Null and from one ID, regardless the Source, flag Y; if Null, flag N
  3. If the rating is Null from both Sources and one ID, flag Y

How do I create the flag column? I tried Windows Function, totally WRONG, Is there anyway I can combine Windows Function and When? Or is there a better way to add a flag col? I am stuck.

from pyspark.sql.types import StructType, StructField, StringType, IntegerType
from pyspark.sql import SparkSession as ss
from pyspark.sql.functions import when, col, lit, row_number
from pyspark.sql.window import Window

spark = ss.builder.appname('new').getOrCreate()

data = [(14, 'AA', None), (14, 'AA', None), (15, 'BB', None), (15, 'BB', 2), (16, 'AA', None), (16, 'AA', 1), (16, 'BB', None), (16, 'BB', 2), (17, 'AA', None), (17, 'AA', None), (17, 'BB', None), (17, 'BB', None)]

schema=StructType([StructField('ID', IntegerType(), False), StructField('Source', StringType(), False), StructField('rating', IntegerType(), True)])

df = Spark.createDataFrame(data, schema)

w = Window.partitionby('ID').orderBy('Source', 'rating')
df = df.withColumn('flag', row_number().over(w))

input table:

|ID|Source |rating
|--|-----|------
| 14  | AA   |Null
| 14  | AA   |NUll
| 15  | BB   |Null
| 15  | BB   |2
| 16  | AA   |Null
| 16  | AA   |1
| 16  | BB   |Null
| 16  | BB   |2
| 17  | AA   |Null
| 17  | AA   |Null
| 17  | BB   |Null
| 17  | BB   |Null

Ideally output table:

| ID| Source | rating| flag
| --| -----|------|-----
| 14  | AA   |Null |N
| 14  | AA   |NUll |N
| 15  | BB   |Null |N
| 15  | BB   |2    |Y
| 16  | AA   |Null |N
| 16  | AA   |1    |Y
| 16  | BB   |Null |N
| 16  | BB   |2    |Y
| 17  | AA   |Null |Y
| 17  | AA   |Null |Y
| 17  | BB   |Null |Y
| 17  | BB   |Null |Y

Upvotes: 0

Views: 103

Answers (0)

Related Questions