Reputation: 497
I have a data that contains 10 lines (samples) that are either Class1
or Class2
and columns.
I want to calculate the mean of rows Class1
and make the result in line 11 and calculate the mean of rows Class2
and make the result in a line 12.
The data:
Name ClassType Col1 Col2 Col3
A Class1 10 50 12
B Class2 7 20 12
C Class1 8 12 8
D Class1 9 14 17
E Class2 3 15 14
F Class2 10 15 16
G Class2 12 22 15
H Class1 10 28 10
The result I want:
Name ClassType Col1 Col2 Col3
A Class1 10 50 12
B Class2 7 20 12
C Class1 8 12 8
D Class1 9 14 17
E Class2 3 15 14
F Class2 10 15 16
G Class2 12 22 15
H Class1 10 28 10
Mean class1 NA 9.25 26 11.75
Mean class2 NA 8 18 14.25
Upvotes: 1
Views: 784
Reputation: 1281
Try This!
import pandas as pd
mean_col1 = data[Col1].mean()
mean_col2 = data[Col2].mean()
data.loc[len(data.index)] = [None, mean_col1, mean_col2, None]
Upvotes: 0
Reputation: 16876
Here is another tidyverse
option, but doing everything in one pipe. We can first group by ClassType
, then summarize
over any columns that start with Col
. Also, in the summarize
statement, we can create the additional names that you want for the bottom rows (i.e., Mean Class 1
and Mean Class 2
) and just return NA
for ClassType
. Then, we can use bind_rows
to bind the original input dataframe to the new one (denoted by the .
) from the previous step.
df %>%
group_by(ClassType) %>%
summarize(Name = paste0("Mean ", unique(ClassType)),
across(starts_with("Col"), mean),
ClassType = NA) %>%
bind_rows(df, .)
Output
Name ClassType Col1 Col2 Col3
1 A Class1 10.00 50 12.00
2 B Class2 7.00 20 12.00
3 C Class1 8.00 12 8.00
4 D Class1 9.00 14 17.00
5 E Class2 3.00 15 14.00
6 F Class2 10.00 15 16.00
7 G Class2 12.00 22 15.00
8 H Class1 10.00 28 10.00
9 Mean Class1 <NA> 9.25 26 11.75
10 Mean Class2 <NA> 8.00 18 14.25
Data
df <- structure(
list(
Name = c("A", "B", "C", "D", "E", "F", "G", "H"),
ClassType = c(
"Class1",
"Class2",
"Class1",
"Class1",
"Class2",
"Class2",
"Class2",
"Class1"
),
Col1 = c(10L, 7L, 8L, 9L, 3L,
10L, 12L, 10L),
Col2 = c(50L, 20L, 12L, 14L, 15L, 15L, 22L, 28L),
Col3 = c(12L, 12L, 8L, 17L, 14L, 16L, 15L, 10L)
),
class = "data.frame",
row.names = c(NA,-8L)
)
Upvotes: 1
Reputation: 1154
You can summarize across columns. To build on @geometricfreedom's answer, here's how (I created a mini-version of your data set, using just the first four rows):
your_data <- data.frame(
Name = c("A", "B", "C", "D"),
ClassType = c("Class1", "Class2", "Class1", "Class1"),
Col1 = c(10, 7, 8, 9),
Col2 = c(50, 20, 12, 14),
Col3 = c(12, 12, 8, 17)
)
calc_cols <- your_data %>%
group_by(ClassType) %>%
summarize(
across(Col1:Col3, mean)
) %>%
mutate(ClassType = NA, Name = c("Mean Class 1", "Mean Class 2"))
your_data <- rbind(your_data, calc_cols)
your_data
Upvotes: 0
Reputation: 183
Try this
library(dplyr)
calc_cols <- your_dataframe %>% group_by(ClassType) %>%
summarise( Col1 = mean(Col1),
Col2 = mean(Col2),
Col3 = mean(Col3)) %>%
mutate( ClassType = NA,
Name = c("Mean class1", "Mean class2"))
your_new_dataframe <- rbind(your_dataframe, calc_cols)
Upvotes: 0