user11509999
user11509999

Reputation:

Pandas DataFrame: Calculate percentage difference between rows?

I have a year wise dataframe with each year has three parameters year,type and value. I'm trying to calculate percentage of taken vs empty. For example year 2014 has total of 50 empty and 50 taken - So 50% in empty and 50% in taken as shown in final_df

df

    year     type          value
            
0     2014  Total          100
1     2014  Empty           50
2     2014  Taken           50
3     2013  Total          2000
4     2013  Empty          100
5     2013  Taken          1900
6     2012  Total          50
7     2012  Empty          45
8     2012  Taken           5

Final df

    year    Empty          Taken
            
0   2014    50             50 
0   2013    ...            ...    
0   2012    ...            ... 

Should i shift cells up and do the percentage calculate or any other method?

Upvotes: 1

Views: 1347

Answers (3)

Mit
Mit

Reputation: 716

As @sophods pointed out, you can use pivot_table to rearange your dataframe, however, to add to his answer; i think you're after the percentage, hence i suggest you keep the 'Total' record and then apply your calculation:

#pivot your data
res = (df.pivot_table(index='year',columns='type',values='value')).reset_index()
#calculate percentages of empty and taken
res['Empty'] = res['Empty']/res['Total']
res['Taken'] = res['Taken']/res['Total']
#final dataframe
res = res[['year', 'Empty', 'Taken']]

Upvotes: 1

sophocles
sophocles

Reputation: 13821

You can use pivot_table:

new = df[df['type'] != 'Total']
res = (new.pivot_table(index='year',columns='type',values='value').sort_values(by='year',ascending=False).reset_index())

which gets you:

res
      year  Empty  Taken
0     2014     50     50
1     2013    100   1900
2     2012     45      5

And then you can get the percentages for each column:

total = (res['Empty'] + res['Taken'])
for col in ['Empty','Taken']:
    res[col+'_perc'] = res[col] / total


year  Empty  Taken  Empty_perc  Taken_perc                                     
2014     50     50        0.50        0.50
2013    100   1900        0.05        0.95
2012     45      5        0.90        0.10

Upvotes: 3

ggaurav
ggaurav

Reputation: 1804

You can filter out records having Empty and Taken in type and then groupby year and apply func. In func, you can set the type as index and then get the required values and calculate the percentage. x in func would be dataframe having type and value columns and data per group.

 def func(x):
    x = x.set_index('type')
    total = x['value'].sum()
    return [(x.loc['Empty', 'value']/total)*100, (x.loc['Taken', 'value']/total)*100]

temp = (df[df['type'].isin({'Empty', 'Taken'})]
        .groupby('year')[['type', 'value']]
        .apply(lambda x: func(x)))
temp

year
2012    [90.0, 10.0]
2013    [5.0, 95.0] 
2014    [50.0, 50.0]
dtype: object

Convert the result into the required dataframe

pd.DataFrame(temp.values.tolist(), index=temp.index, columns=['Empty', 'Taken'])
       Empty    Taken
year        
2012    90.0    10.0
2013    5.0     95.0
2014    50.0    50.0

Upvotes: 0

Related Questions