Reputation: 51
I am struggling with a data frame I have been given:
game.time.total game.time.first.half game.time.second.half
1 95:09 46:04 49:05
2 95:09 46:04 49:05
3 95:31 46:07 49:23
4 95:31 46:07 49:23
5 95:39 46:08 49:31
Currently, these columns are currently factor variables (see str output)
'data.frame': 5 obs. of 3 variables:
$ game.time.total : Factor w/ 29 levels "100:22","100:53",..: 7 7 10 10 12
$ game.time.first.half : Factor w/ 27 levels "45:18","46:00",..: 3 3 5 5 6
$ game.time.second.half: Factor w/ 29 levels "48:01","48:03",..: 12 12 15 15 17
However I wish to be able to average each column using colmeans(). From my understanding I need to convert the column to numeric and to be expressed as minutes.seconds as shown here:
game.time.total game.time.first.half game.time.second.half
1 95.09 46.04 49.05
2 95.09 46.04 49.05
3 95.31 46.07 49.23
4 95.31 46.07 49.23
5 95.39 46.08 49.31
I understand that I could just type them out however there are many more column and rows of similar formatting...Is there a simple way of how to do this? Or do I need to re-adjust the format of the original file(.csv)?
EDIT: Thank you for the answers. My mistake as in my original question I did not provide my actual DF. I have now added this and with the str() result.
@hello_friend this is what is returned when I apply your second solution
game.time.total game.time.first.half game.time.second.half
1 7 3 12
2 7 3 12
3 10 5 15
4 10 5 15
5 12 6 17
Thanks in advance.
Upvotes: 2
Views: 336
Reputation: 5788
Base R solution:
numeric_df <- setNames(data.frame(lapply(data.frame(
Vectorize(gsub)(":", ".", DF), stringsAsFactors = FALSE
),
function(x) {
as.double(x)
})), names(DF))
Data:
DF <- structure(list(game.time.total = c("95:09", "95:09", "95:31",
"95:31", "95:39"), game.time.first.half = c("46:04", "46:04",
"46:07", "46:07", "46:08"), game.time.second.half = c("49:05",
"49:05", "49:23", "49:23", "49:31")), class = "data.frame", row.names = c(NA, -5L))
Upvotes: 1
Reputation: 173858
You have to be careful here. Think about the average of "89:30"
and "90:30"
. They add to 180 minutes, so the mean should be 90:00
. However, if you convert them to 89.30
and 90.30
, then they add to 179.60
and the mean becomes 89.80
, which doesn't even make sense.
There are packages available that make dealing with time easier, such as lubridate
, and if you deal with times frequently you should learn to use them. However, the solution below doesn't require any additional packages and is quite straightforward. It defines two small functions to convert between "mm:ss"
format and number of seconds. You can safely do sums and averages on seconds then convert back to your original format:
as_seconds <- function(x) sapply(strsplit(x, ":"), function(y) sum(as.numeric(y) * c(60, 1)))
as_min_sec <- function(x) paste0(x %/% 60, sprintf(":%02d", 21))
apply(DF, 2, function(x) as_min_sec(mean(as_seconds(x))))
#> game.time.total game.time.first.half game.time.second.half
#> "95:21" "46:21" "49:21"
If you just want to replace the colons with dots in each column, you can do this:
as.data.frame(lapply(DF, function(x) gsub(":", ".", x)))
#> game.time.total game.time.first.half game.time.second.half
#> 1 95.09 46.04 49.05
#> 2 95.09 46.04 49.05
#> 3 95.31 46.07 49.23
#> 4 95.31 46.07 49.23
#> 5 95.39 46.08 49.31
Upvotes: 1
Reputation: 3876
You can convert the columns to minutes and seconds with the ms
function from lubridate package.
library(lubridate)
library(dplyr)
DF %>%
mutate_all(ms) %>%
mutate_all(period_to_seconds) %>%
summarise_all(mean) %>%
mutate_all(seconds_to_period)
game.time.total game.time.first.half game.time.second.half
1 1H 35M 23.8000000000002S 46M 6S 49M 17.4000000000001S
Or without the last mutate_all
call if you want the mean in seconds.
DF %>%
mutate_all(ms) %>%
mutate_all(period_to_seconds) %>%
summarise_all(mean)
game.time.total game.time.first.half game.time.second.half
1 5723.8 2766 2957.4
Note: Asssuming 95.09
means 95 minutes and 9 seconds and not 95 minutes and .09 minutes
Upvotes: 1