Subtract consecutive columns in a Pandas or Pyspark Dataframe

Question

I would like to perform the following operation in a pandas or pyspark dataframe but i still havent found a solution.

I want to subtract the values from consecutive columns in a dataframe.

The operation I am describing can be seen in the image below.

Bear in mind that the output dataframe wont have any values on first column as the first column in the input table cannot be subtracted by its previous one as it doesn't exist.

EdChum · Accepted Answer

diff has an axis param so you can just do this in one step:

In [63]:
df = pd.DataFrame(np.random.rand(3, 4), ['row1', 'row2', 'row3'], ['A', 'B', 'C', 'D'])
df

Out[63]:
             A         B         C         D
row1  0.146855  0.250781  0.766990  0.756016
row2  0.528201  0.446637  0.576045  0.576907
row3  0.308577  0.592271  0.553752  0.512420

In [64]:
df.diff(axis=1)

Out[64]:
       A         B         C         D
row1 NaN  0.103926  0.516209 -0.010975
row2 NaN -0.081564  0.129408  0.000862
row3 NaN  0.283694 -0.038520 -0.041331

Subtract consecutive columns in a Pandas or Pyspark Dataframe

Answers (2)

Related Questions