betelgeuse
betelgeuse

Reputation: 469

Combine Rows in Pandas

I have a pandas DataFrame like this

         100  200  300
283.1   0.01 0.02 0.40
284.1   0.02 0.03 0.42
285.1   0.05 0.01 0.8
286.1   0.06 0.02 0.9

I need to combine a certain number of consecutive rows and calculate the average value for each column and a new index as the average of the indices I used, in order to obtain something like this:

         100   200  300
283.6   0.015 0.025 0.41  
285.6   0.055 0.015 0.85

Is there a way to to this with pandas?

Upvotes: 2

Views: 2441

Answers (2)

Vidhya G
Vidhya G

Reputation: 2330

A simple way:

>>> print df
   index   100   200   300                                                        
0  283.1  0.01  0.02  0.40                                                        
1  284.1  0.02  0.03  0.42                                                        
2  285.1  0.05  0.01  0.80                                                        
3  286.1  0.06  0.02  0.90 

break the DataFrame up into the portions that you want and find the mean of the relevant columns:

>>> pieces = [df[:2].mean(), df[2:].mean()]    

then put the pieces back together using concat:

>>> avgdf = pd.concat(pieces, axis=1).transpose()

   index    100    200   300                                                      
0  283.6  0.015  0.025  0.41                                                      
1  285.6  0.055  0.015  0.85

Alternatively, you can recombine the data with a list comprehension [i for i in pieces] or a generator expression:

>>> z = (i for i in pieces)

and use this to create your new DataFrame:

>>> avgdf = pd.DataFrame(z)

Finally, to set the index:

>>> avgdf.set_index('index', inplace=True)                                                 
>>> print avgdf

         100    200   300                                                         
index                                                                             
283.6  0.015  0.025  0.41                                                         
285.6  0.055  0.015  0.85 

Upvotes: 0

Zero
Zero

Reputation: 77027

Yes -- you could do this in Pandas. Here's one way to do it.

Let's say out, our initial dataframe df is like

   index   100   200   300
0  283.1  0.01  0.02  0.40
1  284.1  0.02  0.03  0.42
2  285.1  0.05  0.01  0.80
3  286.1  0.06  0.02  0.90

Now, calculate the length of dataframe

N = len(df.index)
N
4

We create a grp column -- to be used for aggregation,

where for 2 rows aggregation use [x ]*2 and for n-consecutive rows use [x]*n

df['grp'] = list(itertools.chain.from_iterable([x]*2 for x in range(0, N/2)))
df
   index   100   200   300  grp
0  283.1  0.01  0.02  0.40    0
1  284.1  0.02  0.03  0.42    0
2  285.1  0.05  0.01  0.80    1
3  286.1  0.06  0.02  0.90    1

Now, get the means by grouping the grp column like this --

df.groupby('grp').mean()
     index    100    200   300
grp
0    283.6  0.015  0.025  0.41
1    285.6  0.055  0.015  0.85

Upvotes: 3

Related Questions