MichaelD
MichaelD

Reputation: 51

Convert factor variable minute:second to numerical variable minute.seconds in R

I am struggling with a data frame I have been given:

 game.time.total game.time.first.half game.time.second.half
1           95:09                46:04                 49:05
2           95:09                46:04                 49:05
3           95:31                46:07                 49:23
4           95:31                46:07                 49:23
5           95:39                46:08                 49:31

Currently, these columns are currently factor variables (see str output)

'data.frame':   5 obs. of  3 variables:
 $ game.time.total      : Factor w/ 29 levels "100:22","100:53",..: 7 7 10 10 12
 $ game.time.first.half : Factor w/ 27 levels "45:18","46:00",..: 3 3 5 5 6
 $ game.time.second.half: Factor w/ 29 levels "48:01","48:03",..: 12 12 15 15 17

However I wish to be able to average each column using colmeans(). From my understanding I need to convert the column to numeric and to be expressed as minutes.seconds as shown here:

game.time.total game.time.first.half game.time.second.half
1           95.09                46.04                 49.05
2           95.09                46.04                 49.05
3           95.31                46.07                 49.23
4           95.31                46.07                 49.23
5           95.39                46.08                 49.31

I understand that I could just type them out however there are many more column and rows of similar formatting...Is there a simple way of how to do this? Or do I need to re-adjust the format of the original file(.csv)?

EDIT: Thank you for the answers. My mistake as in my original question I did not provide my actual DF. I have now added this and with the str() result.

@hello_friend this is what is returned when I apply your second solution

 game.time.total game.time.first.half game.time.second.half
1               7                    3                    12
2               7                    3                    12
3              10                    5                    15
4              10                    5                    15
5              12                    6                    17

Thanks in advance.

Upvotes: 2

Views: 336

Answers (3)

hello_friend
hello_friend

Reputation: 5788

Base R solution:

numeric_df <- setNames(data.frame(lapply(data.frame(
  Vectorize(gsub)(":", ".", DF), stringsAsFactors = FALSE
),
function(x) {
  as.double(x)
})), names(DF))

Data:

 DF <- structure(list(game.time.total = c("95:09", "95:09", "95:31", 
"95:31", "95:39"), game.time.first.half = c("46:04", "46:04", 
"46:07", "46:07", "46:08"), game.time.second.half = c("49:05", 
"49:05", "49:23", "49:23", "49:31")), class = "data.frame", row.names = c(NA, -5L))

Upvotes: 1

Allan Cameron
Allan Cameron

Reputation: 173858

You have to be careful here. Think about the average of "89:30" and "90:30". They add to 180 minutes, so the mean should be 90:00. However, if you convert them to 89.30 and 90.30, then they add to 179.60 and the mean becomes 89.80, which doesn't even make sense.

There are packages available that make dealing with time easier, such as lubridate, and if you deal with times frequently you should learn to use them. However, the solution below doesn't require any additional packages and is quite straightforward. It defines two small functions to convert between "mm:ss" format and number of seconds. You can safely do sums and averages on seconds then convert back to your original format:

as_seconds <- function(x) sapply(strsplit(x, ":"), function(y) sum(as.numeric(y) * c(60, 1)))

as_min_sec <- function(x) paste0(x %/% 60, sprintf(":%02d", 21))

apply(DF, 2, function(x) as_min_sec(mean(as_seconds(x))))
#>       game.time.total  game.time.first.half game.time.second.half 
#>               "95:21"               "46:21"               "49:21"

If you just want to replace the colons with dots in each column, you can do this:

as.data.frame(lapply(DF, function(x) gsub(":", ".", x)))
#>   game.time.total game.time.first.half game.time.second.half
#> 1           95.09                46.04                 49.05
#> 2           95.09                46.04                 49.05
#> 3           95.31                46.07                 49.23
#> 4           95.31                46.07                 49.23
#> 5           95.39                46.08                 49.31

Upvotes: 1

Ahorn
Ahorn

Reputation: 3876

You can convert the columns to minutes and seconds with the ms function from lubridate package.

library(lubridate)
library(dplyr)

DF %>% 
  mutate_all(ms) %>% 
  mutate_all(period_to_seconds) %>% 
  summarise_all(mean) %>% 
  mutate_all(seconds_to_period)

           game.time.total game.time.first.half game.time.second.half
1 1H 35M 23.8000000000002S               46M 6S 49M 17.4000000000001S

Or without the last mutate_all call if you want the mean in seconds.

DF %>% 
  mutate_all(ms) %>% 
  mutate_all(period_to_seconds) %>% 
  summarise_all(mean)

  game.time.total game.time.first.half game.time.second.half
1          5723.8                 2766                2957.4

Note: Asssuming 95.09 means 95 minutes and 9 seconds and not 95 minutes and .09 minutes

Upvotes: 1

Related Questions