John
John

Reputation: 503

Python column difference

I need to create a column that computes the difference between another column's elements:

Column A    Computed Column
10           blank  # nothing to compute for first record
9            1  # = 10-9
7            2  # = 9-7
4            3  # = 7-4

I am assuming this is a lambda function, but i am not sure how to reference the elements in 'Column A'

Any help/direction you can provide would be great- thanks!

Upvotes: 0

Views: 1539

Answers (3)

pstatix
pstatix

Reputation: 3848

I would just do:

df = pd.DataFrame(data=[10,9,7,4], columns=['A'])
df['B'] = abs(df['A'].diff())

The reason for abs() is because diff() computes the difference between current - previous whereas you want previous - current. This method is already built-in to the Series class, so using abs() will get you the correct result by taking the absolute value either way.

To support:

import pandas as pd
df = pd.DataFrame(data=[10,9,7,4], columns=['A'])
df['B'] = abs(df['A'].diff())
>>> df
# Output
    A    B
0  10  NaN
1   9  1.0
2   7  2.0
3   4  3.0
df2 = pd.DataFrame(data=[10,4,7,9], columns=['A'])
df2['B'] = abs(df2['A'].diff())
>>> df2
# Output
    A    B
0  10  NaN
1   4  6.0
2   7  3.0
3   9  2.0

To still out perform that of @cosmic_inquiry's solution:

import pandas as pd
df = pd.DataFrame(data=[10,9,7,4], columns=['A'])
df2 = pd.DataFrame(data=[10,4,7,9], columns=['A'])
df['B'] = df['A'].diff() * -1
df2['B'] = df2['A'].diff() * -1
>>> df
# Output:
    A    B
0  10  NaN
1   9  1.0
2   7  2.0
3   4  3.0
>>> df2
# Output:
    A    B
0  10  NaN
1   4  6.0
2   7 -3.0
3   9 -2.0

Upvotes: 0

Spinor8
Spinor8

Reputation: 1607

You can do it by shifting the column.

import pandas as pd

dict1 = {'A': [10,9,7,4]}
df = pd.DataFrame.from_dict(dict1)

df['Computed'] = df['A'].shift() - df['A']
print(df)

giving

    A  Computed
0  10       NaN
1   9       1.0
2   7       2.0
3   4       3.0

EDIT: OP extended his requirement to multi columns

dict1 = {'A': [10,9,7,4], 'B': [10,9,7,4], 'C': [10,9,7,4]}
df = pd.DataFrame.from_dict(dict1)

columns_to_update = ['A', 'B']
for col in columns_to_update:
    df['Computed'+col] = df[col].shift() - df[col]
print(df)

By using the columns_to_update, you can choose the columns you want.

    A   B   C  ComputedA  ComputedB
0  10  10  10        NaN        NaN
1   9   9   9        1.0        1.0
2   7   7   7        2.0        2.0
3   4   4   4        3.0        3.0

Upvotes: 2

cosmic_inquiry
cosmic_inquiry

Reputation: 2684

Use diff.

df = pd.DataFrame(data=[10,9,7,4], columns=['A'])
df['B'] = df.A.diff(-1).shift(1)

Output:

df
Out[140]: 
    A    B
0  10  NaN
1   9  1.0
2   7  2.0
3   4  3.0

Upvotes: 0

Related Questions