David Warren
David Warren

Reputation: 179

Comparing each row of a dataframes' values with the first row to see change from starting point

I have a dataframe of price changes by category (columns) and date (rows) and I am keen to compare each subsequent row after the first, back to the first to see how it has changed as a percentage over time.

For example, my current df looks like:

df = pd.DataFrame({'date': ['2020-03-22', '2020-03-23', '2020-03-24', '2020-03-25', '2020-03-26'], 'price': [2, 3, 6, 7, 12]})

 |  date       |  price
 |-------------|--------
0|  2020-03-22 |    2
1|  2020-03-23 |    3
2|  2020-03-24 |    6
3|  2020-03-25 |    5
4|  2020-03-26 |    12

and I am keen to change this price column to be the comparison back to row 0 such as the below:

 |  date       |  price
 |-------------|--------
0|  2020-03-22 |    1
1|  2020-03-23 |    1.5
2|  2020-03-24 |    3
3|  2020-03-25 |    2.5
4|  2020-03-26 |    6

Note, my actual data frame is approx 25 cols wide and 350 rows, so cherry-picking columns and rows won't work. Im thinking perhaps a loop could work, but am concerned that would be rather resource heavy. Please note, the dataframe itself is produced out of a pivot function, not sure if that helps at all.

Any help will be appreciated!

Upvotes: 1

Views: 1209

Answers (1)

jezrael
jezrael

Reputation: 862681

Use Series.div for divide by first value of column:

df['price'] = df['price'].div(df['price'].iloc[0])
print (df)
         date  price
0  2020-03-22    1.0
1  2020-03-23    1.5
2  2020-03-24    3.0
3  2020-03-25    3.5
4  2020-03-26    6.0

If need working with multiple columns without first column use DataFrame.iloc for select:

df.iloc[:, 1:] = df.iloc[:, 1:].div(df.iloc[0, 1:])
print (df)
         date price
0  2020-03-22   1.0
1  2020-03-23   1.5
2  2020-03-24   3.0
3  2020-03-25   3.5
4  2020-03-26   6.0

Or create DatetimeIndex:

df = df.set_index('date')
df= df.div(df.iloc[0])
print (df)
            price
date             
2020-03-22    1.0
2020-03-23    1.5
2020-03-24    3.0
2020-03-25    3.5
2020-03-26    6.0

If ned working onlt with numeric columns:

cols = df.select_dtypes(np.number).columns
df[cols] = df[cols].div(df.iloc[0, df.columns.get_indexer(cols)])

Another solution from @piRSquared, thank you:

df = df.assign(**df.select_dtypes(np.number).pipe(lambda d: d / d.iloc[0]))

print (df)
         date price
0  2020-03-22   1.0
1  2020-03-23   1.5
2  2020-03-24   3.0
3  2020-03-25   3.5
4  2020-03-26   6.0

Upvotes: 2

Related Questions