Reputation: 55
I would like to sum values in one column based on a condition in another column. I can do this when the condition exists, but if it does not, I get an error. I need this to accept that condition doesn't exist and move on to the next step.
Example df:
import pandas as pd
technologies = ({
'Courses':["Spark","PySpark","Hadoop","Python","Pandas","Hadoop","Spark","Python"],
'Fee' :[22000,25000,23000,24000,26000,25000,25000,22000],
'Duration':['30days','50days','55days','40days','60days','35days','55days','50days']
})
df = pd.DataFrame(technologies, columns=['Courses','Fee','Duration'])
print(df)
Courses Fee Duration
0 Spark 22000 30days
1 PySpark 25000 50days
2 Hadoop 23000 55days
3 Python 24000 40days
4 Pandas 26000 60days
5 Hadoop 25000 35days
6 Spark 25000 55days
7 Python 22000 50days
for this example, I would like to sum the fee for all lines that have "55days"
duration = df.groupby('Duration')['Fee'].sum()["55days"]
print (df)
48000
# but if I choose a value that does not appear under Duration like "22days" I get an error
duration22 = df.groupby('Duration')['Fee'].sum()["22days"]
Can you please advise how I can code this so if the value "22days" happens not to exist on this run it does not fail or it just puts a 0 value in if null?
Upvotes: 0
Views: 465
Reputation: 3720
You could do a pre-lookup check in the grouped index.
gd_sum = df.groupby('Duration')['Fee'].sum()
def dur_sum(k):
return gd_sum[k] if k in gd_sum.index else 0
print(dur_sum('55days'))
48000
print(dur_sum('22days'))
0
Upvotes: 1