proximacentauri
proximacentauri

Reputation: 1879

pandas take average on odd rows

I want to fill in data between each row in a dataframe with an average of current and next row (where columns are numeric)

starting data:

   time value value_1  value-2
0   0    0      4        3
1   2    1      6        6

intermediate df:

   time value value_1  value-2
0   0    0      4        3
1   1    0      4        3     #duplicate of row 0
2   2    1      6        6
3   3    1      6        6     #duplicate of row 2

I would like to create df_1:

   time value value_1  value-2
0   0    0      4        3
1   1    0.5    5        4.5     #average of row 0 and 2
2   2    1      6        6
3   3    2      8        8       #average of row 2 and 4

To to this I appended a copy of the starting dataframe to create the intermediate dataframe shown above:

df = df_0.append(df_0)
df.sort_values(['time'], ascending=[True], inplace=True)
df = df.reset_index()
df['value_shift'] = df['value'].shift(-1)
df['value_shift_1'] = df['value_1'].shift(-1)
df['value_shift_2'] = df['value_2'].shift(-1)

then I was thinking of applying a function to each column:

def average_vals(numeric_val):
    #average every odd row
    if int(row.name) % 2 != 0:
        #take average of value and value_shift for each value
        #but this way I need to create 3 separate functions

Is there a way to do this without writing a separate function for each column and applying to each column one by one (in real data I have tens of columns)?

Upvotes: 2

Views: 425

Answers (1)

Chris Adams
Chris Adams

Reputation: 18647

How about this method using DataFrame.reindex and DataFrame.interpolate

df.reindex(np.arange(len(df.index) * 2) / 2).interpolate().reset_index(drop=True)

Explanation

Reindex, in half steps reindex(np.arange(len(df.index) * 2) / 2)

This gives a DataFrame like this:

     time  value  value_1  value-2
0.0   0.0    0.0      4.0      3.0
0.5   NaN    NaN      NaN      NaN
1.0   2.0    1.0      6.0      6.0
1.5   NaN    NaN      NaN      NaN

Then use DataFrame.interpolate to fill in the NaN values .... the default will be linear interpolation, so mean in this case.

Finaly, use .reset_index(drop=True) to fix your index.

Should give

   time  value  value_1  value-2
0   0.0    0.0      4.0      3.0
1   1.0    0.5      5.0      4.5
2   2.0    1.0      6.0      6.0
3   2.0    1.0      6.0      6.0

Upvotes: 2

Related Questions