Reputation: 29
I have the following dataframe in Pyspark which is already inside a groupby by the column "accountname".
accountname | namespace | cost | cost_to_pay
account001 | ns1 | 93 | 9
account001 | Transversal | 93 | 25
account002 | ns2 | 50 | 27
account002 | Transversal | 50 | 12
I need a new column that is the "cost" - "cost_to_pay"
where "namespace" == "Transversal"
, I need this result in all the fields of the new column, something like this:
accountname | namespace | cost | cost_to_pay | new_column1
account001 | ns1 | 93 | 9 | 68
account001 | Transversal | 93 | 25 | 68
account002 | ns2 | 50 | 27 | 38
account002 | Transversal | 50 | 12 | 38
68 is the result of subtracting 93 - 25 for the groupby from account001. And 38 the result of subtracting 50 - 12 for account002.
Any idea how I can achieve this?
Upvotes: 1
Views: 1197
Reputation: 3817
if df
is your dataframe after groupby
, you can find a df_temp
using:
df_temp = df.filter(F.col('namespace')=='Transversal')
df_temp = df_temp.withcolumn('new_column1', F.col('cost') - F.col('cost_to_pay'))
df_temp = df_temp.select('accountname', 'new_column1') ## keep only relevant columns
## you might want to have some extra checks, like droping duplicates, etc
## and finally join df_temp with you main dataframe df
df = df.join(df_temp, on='accountname', how='left')
df = df.na.fill({'accountname':'some predefined value, like 0}) ## if you wish to fill nulls
Upvotes: 1
Reputation: 42352
You can get the difference for each accountname using the maximum of a masked difference:
from pyspark.sql import functions as F, Window
df2 = df.withColumn(
'new_column1',
F.max(
F.when(
F.col('namespace') == 'Transversal',
F.col('cost') - F.col('cost_to_pay')
)
).over(Window.partitionBy('accountname'))
)
df2.show()
+-----------+-----------+----+-----------+-----------+
|accountname| namespace|cost|cost_to_pay|new_column1|
+-----------+-----------+----+-----------+-----------+
| account001| ns1| 93| 9| 68|
| account001|Transversal| 93| 25| 68|
| account002| ns2| 50| 27| 38|
| account002|Transversal| 50| 12| 38|
+-----------+-----------+----+-----------+-----------+
Upvotes: 2