Reputation: 386
I have a data frame with the high and low values for each day in different periods of time (A, B, C, D). I want to get the most frequent value in that range of values. This a reproducible example of my dataframe:
library(modeest)
library(tidyverse)
st <- as.Date("2021-05-19")
en <- as.Date("2021-07-16")
day_month = seq(st, en, by = "1 days")
low_A = seq(1.25, 15.75, by = 0.25)
high_A = seq(2.25, 16.75, by = 0.25)
low_B = seq(0.25, 14.75, by = 0.25)
high_B = seq(0.50, 15, by = 0.25)
low_C = seq(1.25, 15.75, by = 0.25)
high_C = seq(2.25, 16.75, by = 0.25)
low_D = seq(0.75, 15.25, by = 0.25)
high_D = seq(2.25, 16.75, by = 0.25)
df <- data.frame(day_month, high_A, low_A, high_B, low_B, high_C, low_C, high_D, low_D)
Given that I have a range of values for each day the price touch each value between the high and the low of that day. So for a given day, let's say the last day of my data frame which is 2021-07-16 the highest value was 16.75 while the lowest value of that day was 14.75.
tail(df)
day_month high_A low_A high_B low_B high_C low_C high_D low_D
54 2021-07-11 15.50 14.50 13.75 13.50 15.50 14.50 15.50 14.00
55 2021-07-12 15.75 14.75 14.00 13.75 15.75 14.75 15.75 14.25
56 2021-07-13 16.00 15.00 14.25 14.00 16.00 15.00 16.00 14.50
57 2021-07-14 16.25 15.25 14.50 14.25 16.25 15.25 16.25 14.75
58 2021-07-15 16.50 15.50 14.75 14.50 16.50 15.50 16.50 15.00
59 2021-07-16 16.75 15.75 15.00 14.75 16.75 15.75 16.75 15.25
So the price was between that range of value. If I use an interval of 0.25 points the price was running from 16.75, 16.50, 16.25, 16.00 ...15.00, 14.75. Each time has a range of values, for time A the range on the last day of my data frame was 15.50, 15.25, 15.00, 14.75, 14.50. for time B the range of the last day was 13.75, 13.50, and so on. What I want is to find the mode (or the most frequent value) in that day using those range of values. So I create this function:
poc_2 <- function(df){
# create the sequence of each period
x_A <- with(df, seq(low_A, high_A, by = 0.25))
x_B <- with(df, seq(low_B, high_B, by = 0.25))
x_C <- with(df, seq(low_C, high_C, by = 0.25))
x_D <- with(df, seq(low_D, high_D, by = 0.25))
# the range has different lenght so I use this to make all range of the same length with NA values
n <- max(length(x_A), length(x_B), length(x_C), length(x_D))
length(x_A) <- n
length(x_B) <- n
length(x_C) <- n
length(x_D) <- n
pf <- cbind(x_A, x_B,x_C, x_D)
xfd <- data.frame(pf)
# I change the format of my data frame so I can calculate the mode of all values
long <- xfd %>% gather(x, value, x_A:x_D)
# delete the NA values that are given by the change of length
long <- na.omit(long)
# get the mode of the last value
return(last(mfv(long$value)))
}
This code works and returns the expected value for just one row by using:
poc_2(df[59,])
[1] 16.75
Which gives the highest mode of that range on the last day. I want to calculate this for each row of my data frame. I tried several options that I found.
df %>% rowwise() %>% mutate(poc = poc_2())
# gives an error:
Error: Problem with `mutate()` input `poc`.
x argument "df" is missing, with no default
i Input `poc` is `poc_2()`.
i The error occurred in row 1.
I also tried:
apply(df, 1, poc_2)
Error in eval(substitute(expr), data, enclos = parent.frame()) :
invalid 'envir' argument of type 'character'
My question is: Is there any way I can get this function for all rows of my data frame so I can store a new variable with the mode of that day?
Upvotes: 0
Views: 976
Reputation: 388982
If you want to use your poc_2
function you can do that by using one of the option below -
library(tidyverse)
#1. sapply split
sapply(split(df, seq(nrow(df))),poc_2)
#2. by
by(df, seq(nrow(df)),poc_2)
#3. tidyverse
df %>% group_split(row_number()) %>%map_dbl(poc_2)
#4.
df %>% rowwise() %>% mutate(poc = poc_2(cur_data()))
However, why not use the Mode function from here and apply it for each day ?
Mode <- function(x) {
ux <- unique(x)
ux[which.max(tabulate(match(x, ux)))]
}
df %>%
pivot_longer(cols = -day_month) %>%
group_by(day_month) %>%
summarise(frequent_value = Mode(value))
# day_month frequent_value
# <date> <dbl>
# 1 2021-05-19 2.25
# 2 2021-05-20 2.5
# 3 2021-05-21 2.75
# 4 2021-05-22 3
# 5 2021-05-23 3.25
# 6 2021-05-24 3.5
# 7 2021-05-25 3.75
# 8 2021-05-26 4
# 9 2021-05-27 4.25
#10 2021-05-28 4.5
# … with 49 more rows
Or using rowwise
-
df %>%
rowwise() %>%
mutate(frequent_value = Mode(c_across(-day_month))) %>%
ungroup
Upvotes: 1