Demis
Demis

Reputation: 117

Subtract consecutive columns in a Pandas or Pyspark Dataframe

I would like to perform the following operation in a pandas or pyspark dataframe but i still havent found a solution.

I want to subtract the values from consecutive columns in a dataframe.

The operation I am describing can be seen in the image below.

Input and Output Dataframe

Bear in mind that the output dataframe wont have any values on first column as the first column in the input table cannot be subtracted by its previous one as it doesn't exist.

Upvotes: 4

Views: 2251

Answers (2)

EdChum
EdChum

Reputation: 393903

diff has an axis param so you can just do this in one step:

In [63]:
df = pd.DataFrame(np.random.rand(3, 4), ['row1', 'row2', 'row3'], ['A', 'B', 'C', 'D'])
df

Out[63]:
             A         B         C         D
row1  0.146855  0.250781  0.766990  0.756016
row2  0.528201  0.446637  0.576045  0.576907
row3  0.308577  0.592271  0.553752  0.512420

In [64]:
df.diff(axis=1)

Out[64]:
       A         B         C         D
row1 NaN  0.103926  0.516209 -0.010975
row2 NaN -0.081564  0.129408  0.000862
row3 NaN  0.283694 -0.038520 -0.041331

Upvotes: 4

piRSquared
piRSquared

Reputation: 294218

df = pd.DataFrame(np.random.rand(3, 4), ['row1', 'row2', 'row3'], ['A', 'B', 'C', 'D'])
df.T.diff().T

enter image description here

Upvotes: 1

Related Questions