Sahil
Sahil

Reputation: 439

Subtracting values from a column on a specific condition and getting a new DataFrame

I have a Dataset with two columns, I would like to do some operations on a particular column and get a new dataframe altogether. Consider this as my dataset:

A   B
1   01
1   56
1   89
1   108
2   23
2   36
2   89
3   13
4   45

I would like to perform two operations on the column B and create a dataframe with these 2 columns. 1st Column would be the Highest number for 1 ie - 108 subtracted by its least - 1 (108 - 1), for 2 - (89 - 23) and if its a single instance it should directly be 0. 2nd Column would be a specific number, assume it to be 125 subtracted by the very first instance of value in A ie ( 125 - 1), (125 - 23), (125 - 13)... We should get something like this:

A   C     D
1   107  124
2   66   102
3   0    112
4   0    80

I was thinking of using .loc to find the specific position of the value and then subtract it, How should I do this?

Upvotes: 1

Views: 839

Answers (3)

jezrael
jezrael

Reputation: 863341

Use agg by first and custom function with lambda, then rename columns and substract 125 with D :

df = df.groupby('A')['B'].agg([lambda x: x.max() - x.min(), 'first']) \
                         .rename(columns={'first':'D','<lambda>':'C'}) \
                         .assign(D= lambda x: 125 - x['D']) \
                         .reset_index() 
print (df)
   A    C    D
0  1  107  124
1  2   66  102
2  3    0  112
3  4    0   80

rename is necessary, because deprecate groupby agg with a dictionary when renaming.

Another solution:

df = df.groupby('A')['B'].agg(['min','max', 'first']) \
                         .rename(columns={'first':'D','min':'C'}) \
                         .assign(D=lambda x: 125 - x['D'], C=lambda x: x['max'] - x['C']) \
                         .drop('max', axis=1) \
                         .reset_index() 
print (df)
   A    C    D
0  1  107  124
1  2   66  102
2  3    0  112
3  4    0   80

Upvotes: 3

Zero
Zero

Reputation: 77007

You could

In [1494]: df.groupby('A', as_index=False).B.agg(
              {'C': lambda x: x.max() - x.min(), 'D': lambda x: 125-x.iloc[0]})
Out[1494]:
   A    C    D
0  1  107  124
1  2   66  102
2  3    0  112
3  4    0   80

Upvotes: 0

DYZ
DYZ

Reputation: 57105

u = df.groupby('A').agg(['max', 'min', 'first'])
u.columns = 'max', 'min', 'first'
u['C'] = u['max'] - u['min']
u['D'] = 125 - u['first']
del u['min']
del u['max']
del u['first']
u.reset_index()
#   A    C    D
#0  1  107  124
#1  2   66  102
#2  3    0  112
#3  4    0   80

Upvotes: 0

Related Questions