Calculate mean for selected rows for selected columns in pandas data frame

Question

I have pandas df with say, 100 rows, 10 columns, (actual data is huge). I also have row_index list which contains, which rows to be considered to take mean. I want to calculate mean on say columns 2,5,6,7 and 8. Can we do it with some function for dataframe object?

What I know is do a for loop, get value of row for each element in row_index and keep doing mean. Do we have some direct function where we can pass row_list, and column_list and axis, for ex df.meanAdvance(row_list,column_list,axis=0) ?

I have seen DataFrame.mean() but it didn't help I guess.

  a b c d q 
0 1 2 3 0 5
1 1 2 3 4 5
2 1 1 1 6 1
3 1 0 0 0 0

I want mean of 0, 2, 3 rows for each a, b, d columns

  a b d
0 1 1 2

PdevG · Accepted Answer

To select the rows of your dataframe you can use iloc, you can then select the columns you want using square brackets.

For example:

 df = pd.DataFrame(data=[[1,2,3]]*5, index=range(3, 8), columns = ['a','b','c'])

gives the following dataframe:

to select only the 3d and fifth row you can do:

df.iloc[[2,4]]

which returns:

   a  b  c
5  1  2  3
7  1  2  3

if you then want to select only columns b and c you use the following command:

df[['b', 'c']].iloc[[2,4]]

which yields:

   b  c
5  2  3
7  2  3

To then get the mean of this subset of your dataframe you can use the df.mean function. If you want the means of the columns you can specify axis=0, if you want the means of the rows you can specify axis=1

thus:

df[['b', 'c']].iloc[[2,4]].mean(axis=0)

returns:

b    2
c    3

As we should expect from the input dataframe.

For your code you can then do:

 df[column_list].iloc[row_index_list].mean(axis=0)

EDIT after comment: New question in comment: I have to store these means in another df/matrix. I have L1, L2, L3, L4...LX lists which tells me the index whose mean I need for columns C[1, 2, 3]. For ex: L1 = [0, 2, 3] , means I need mean of rows 0,2,3 and store it in 1st row of a new df/matrix. Then L2 = [1,4] for which again I will calculate mean and store it in 2nd row of the new df/matrix. Similarly till LX, I want the new df to have X rows and len(C) columns. Columns for L1..LX will remain same. Could you help me with this?

Answer:

If i understand correctly, the following code should do the trick (Same df as above, as columns I took 'a' and 'b':

first you loop over all the lists of rows, collection all the means as pd.series, then you concatenate the resulting list of series over axis=1, followed by taking the transpose to get it in the right format.

dfs = list()
for l in L:
    dfs.append(df[['a', 'b']].iloc[l].mean(axis=0))

mean_matrix = pd.concat(dfs, axis=1).T

Calculate mean for selected rows for selected columns in pandas data frame

Answers (2)

Related Questions