Reputation: 617
I'm breaking my head here with academic work. I have a data.frame with several numeric columns. I am using the command summarize
and group_by
in R to perform the average calculations of my data frame.
I tried with the code summarize (across (where (is.numeric), mean), -c(Mes, year_date)
, but it calculates the average of the entire data.frame and in addition, it creates a new column -c (Mes, year_date))
, I would like some numeric columns to be excluded from the media calculation, but continue on the data.frame.
Note that I tried -c(Mes, year_date)
to exclude these two columns from the average calculation, but it didn't work.
I tried
library(tidyr)
library(dplyr)
library(lubridate)
sample_station <-c('A','A','A','A','A','A','A','A','A','A','A','B','B','B','B','B','B','B','B','B','B','C','C','C','C','C','C','C','C','C','C','A','B','C','A','B','C')
Date_dmy <-c('01/01/2000','08/08/2000','16/03/2001','22/09/2001','01/06/2002','05/01/2002','26/01/2002','16/02/2002','09/03/2002','30/03/2002','20/04/2002','04/01/2000','11/08/2000','19/03/2001','25/09/2001','04/06/2002','08/01/2002','29/01/2002','19/02/2002','12/03/2002','13/09/2001','08/01/2000','15/08/2000','23/03/2001','29/09/2001','08/06/2002','12/01/2002','02/02/2002','23/02/2002','16/03/2002','06/04/2002','01/02/2000','01/02/2000','01/02/2000','02/11/2001','02/11/2001','02/11/2001')
temperature <-c(17,20,24,19,17,19,23,26,19,19,21,15,23,18,22,22,23,18,19,26,21,22,23,27,19,19,21,23,24,25,26,29,30,21,25,24,23)
wind_speed<-c(3.001,6.332,9.321,10.9091,6.38,10.5882,10.5,10.4348,10.3846,10.3448,10.3125,8.35,10.2632,10.2439,10.2273,10.2128,10.2,10.1887,10.1786,12,10.1613,10.1538,10.1471,10.1408,10.1351,10.1299,10.125,2.36,10.1163,10.1124,10.1087,11.2,10.102,10.099,10.0962,10.0935,10.0909)
esp<-c(11.6,11.3,11,10.7,10.4,10.1,9.8,9.5,9.2,8.9,8.6,8.3,8,11.2,10.9,10.6,10.3,10,12.8,12.5,12.2,11.9,11.6,11.3,11,4.36,4.06,3.76,3.46,3.16,2.86,2.56,2.26,1.96,1.66,1.36,23)
volum<-c(300,300,300,300,300,300,300,300,250,250,250,250,250,250,400,400,400,400,400,105,105,105,105,105,105,105,105,105,105,81,81,81,81,81,81,81,81)
df<-data.frame(sample_station, Date_dmy, temperature, wind_speed, esp, volum)%>%
mutate(Date_dmy = dmy(Date_dmy)) %>%
mutate(year_date = floor_date(Date_dmy,'year'))%>%
mutate(Ano=year(Date_dmy))%>%
mutate(Mes=month(Date_dmy))%>%
mutate(Epoca = ifelse(Mes %in% 4:9,'dry','rainy'))%>%
group_by(sample_station, Epoca, Ano)%>%
summarise(across(where(is.numeric), mean), -c(Mes, year_date))
I have several columns that I don't want to be averaged (even if they are numeric). For exemple, columns esp
and volum
.
update
Exit expectation
Upvotes: 0
Views: 960
Reputation: 1370
Because you are summarising only part of the data, you need to specify what data (rows) of the un-summarised data you want to maintain. In your example, you don't want to summarise Mes
and year_date
, however you have multiple values within each group (sample_station
, Epoca
, Ano
), of these Mes
and year_date
columns.
Which values of these unsummarised columns do you want to keep?
If you want to keep all values of the unsummarised columns, you may want to include Mes
and year_date
inside group_by(sample_station, Epoca, Ano)
before summarising.
Alternatively, you may use mutate()
rather than summarise()
to get summary values in a new column for each row of the original dataframe, then choose your rows from there.
Update:
Again, with your edited post including desired output, what values do you expect for Mes
. For example, when sample_station == 'A'
, Epoca == 'rainy'
and Ano == 2000
, you have values for Mes
of 1
& 2
, and the same year_date
. summarise()
wants to calculate one single summary value for this group.
Upvotes: 2
Reputation: 388982
You can use across(c(where(is.numeric), -Mes)
. Note that year_date
is not included in the calculation as it is not of class numeric and also because it is included in group_by
.
You can also combine multiple mutate
statements into one.
If you want to exclude certain columns from the average calculation but want to keep it in the dataframe you need to decide which value do you want to keep. For example, to keep the 1st value you can use first
.
library(dplyr)
library(lubridate)
data.frame(sample_station, Date_dmy, temperature, wind_speed)%>%
mutate(Date_dmy = dmy(Date_dmy),
year_date = floor_date(Date_dmy,'year'),
Ano=year(Date_dmy),
Mes=month(Date_dmy),
Epoca = ifelse(Mes %in% 4:9,'dry','rainy')) %>%
group_by(sample_station, year_date, Epoca) %>%
summarise(across(c(where(is.numeric), -Mes), mean),
across(Mes, first))
Upvotes: 0