Reputation: 497

Calculate the mean of specific rows

I have a data that contains 10 lines (samples) that are either Class1 or Class2 and columns.

I want to calculate the mean of rows Class1 and make the result in line 11 and calculate the mean of rows Class2 and make the result in a line 12.

The data:

Name   ClassType   Col1   Col2   Col3
A      Class1       10     50    12
B      Class2        7     20    12
C      Class1        8     12     8
D      Class1        9     14    17
E      Class2        3     15    14
F      Class2       10     15    16
G      Class2       12     22    15
H      Class1       10     28    10

The result I want:

       Name   ClassType    Col1   Col2   Col3     
        A      Class1       10     50    12        
        B      Class2        7     20    12
        C      Class1        8     12     8
        D      Class1        9     14    17
        E      Class2        3     15    14
        F      Class2       10     15    16
        G      Class2       12     22    15
        H      Class1       10     28    10
 Mean class1    NA          9.25   26    11.75
 Mean class2    NA          8      18    14.25

Upvotes: 1

Answers (4)

Aravind R

Reputation: 1281

Try This!

import pandas as pd 

mean_col1 = data[Col1].mean() 
mean_col2 = data[Col2].mean()

data.loc[len(data.index)] = [None, mean_col1, mean_col2, None]

Upvotes: 0

AndrewGB

Reputation: 16876

Here is another tidyverse option, but doing everything in one pipe. We can first group by ClassType, then summarize over any columns that start with Col. Also, in the summarize statement, we can create the additional names that you want for the bottom rows (i.e., Mean Class 1 and Mean Class 2) and just return NA for ClassType. Then, we can use bind_rows to bind the original input dataframe to the new one (denoted by the .) from the previous step.

df %>%
  group_by(ClassType) %>%
  summarize(Name = paste0("Mean ", unique(ClassType)),
            across(starts_with("Col"), mean),
            ClassType = NA) %>%
  bind_rows(df, .)

Output

          Name ClassType  Col1 Col2  Col3
1            A    Class1 10.00   50 12.00
2            B    Class2  7.00   20 12.00
3            C    Class1  8.00   12  8.00
4            D    Class1  9.00   14 17.00
5            E    Class2  3.00   15 14.00
6            F    Class2 10.00   15 16.00
7            G    Class2 12.00   22 15.00
8            H    Class1 10.00   28 10.00
9  Mean Class1      <NA>  9.25   26 11.75
10 Mean Class2      <NA>  8.00   18 14.25

Data

df <- structure(
  list(
    Name = c("A", "B", "C", "D", "E", "F", "G", "H"),
    ClassType = c(
      "Class1",
      "Class2",
      "Class1",
      "Class1",
      "Class2",
      "Class2",
      "Class2",
      "Class1"
    ),
    Col1 = c(10L, 7L, 8L, 9L, 3L,
             10L, 12L, 10L),
    Col2 = c(50L, 20L, 12L, 14L, 15L, 15L, 22L, 28L),
    Col3 = c(12L, 12L, 8L, 17L, 14L, 16L, 15L, 10L)
  ),
  class = "data.frame",
  row.names = c(NA,-8L)
)

Upvotes: 1

rdelrossi

Reputation: 1154

You can summarize across columns. To build on @geometricfreedom's answer, here's how (I created a mini-version of your data set, using just the first four rows):

your_data <- data.frame(
  Name = c("A", "B", "C", "D"), 
  ClassType = c("Class1", "Class2", "Class1", "Class1"), 
  Col1 = c(10, 7, 8, 9), 
  Col2 = c(50, 20, 12, 14),
  Col3 = c(12, 12, 8, 17)
)

calc_cols <- your_data %>% 
  group_by(ClassType) %>% 
  summarize(
    across(Col1:Col3, mean)
  ) %>% 
  mutate(ClassType = NA, Name = c("Mean Class 1", "Mean Class 2"))

your_data <- rbind(your_data, calc_cols)

your_data

Upvotes: 0

geometricfreedom

Reputation: 183

Try this

library(dplyr)

calc_cols <- your_dataframe %>% group_by(ClassType) %>%
   summarise( Col1 = mean(Col1),
              Col2 = mean(Col2),
              Col3 = mean(Col3)) %>%
   mutate( ClassType = NA,
           Name = c("Mean class1", "Mean class2"))

your_new_dataframe <- rbind(your_dataframe,  calc_cols)

Upvotes: 0

Calculate the mean of specific rows

Answers (4)

Related Questions