Reputation: 7107
This is somewhat of an addition/side question to one I am asking here but the question is different.
I would like to know where I am going wrong with my code in regards to mapping over nested tibbles. The data can be generated using:
library(tidyquant)
library(lubridate)
tickers <- c("GIS", "KR", "MKC", "SJM", "EL", "HRL", "HSY", "K",
"KMB", "MDLZ", "MNST", "PEP", "PG", "PM", "SYY", "TAP", "TSN", "WBA", "WMT",
"MMM", "ABMD", "ACN", "AMD", "AES", "AON", "ANTM", "APA", "CSCO", "CMS", "KO", "GRMN", "GPS",
"JEC", "SJM", "JPM", "JNPR", "KSU", "KEYS", "KIM", "NBL", "NEM", "NWL", "NFLX", "NEE", "NOC", "TMO", "TXN", "TWTR")
data <- tq_get(tickers,
get = "stock.prices", # Collect the stock price data from 2010 - 2015
from = "2010-01-01",
to = "2015-01-01") %>%
group_by(symbol) %>%
tq_transmute(select = adjusted, # Convert the data from daily prices to monthly prices
mutate_fun = periodReturn,
period = "monthly",
type = "arithmetic")
df_monthly <- data %>%
mutate(year = year(date)) %>%
group_by(symbol, year) %>% # I group_by and nest the data in order to create the event data which remains fixed over the monthly periods
nest() %>%
mutate( # Here I randomly create the dates
release_date = paste(year,
str_pad(ceiling(runif(row_number(), min = 1, max = 12)), 2, pad = "0"), # Create the months 1 - 12 months
str_pad(ceiling(runif(row_number(), min = 1, max = 27)), 2, pad = "0"), # Create the days - I choose 27 days in a month since later I set the days to the end of month day
sep = "-"),
score = runif(row_number(), min = 0, max = 1), # Randomly generate some scoring function
release_date = as.Date(release_date),
release_date = ceiling_date(release_date, "month") - days(1) # This gives the end of month date
) %>%
unnest() %>% # unnest to expand the yearly release_date and score to the monthly data
ungroup() %>%
mutate_if(is.integer, as.numeric) %>%
arrange(release_date)
The part I am stuck on is this part:
d <- df_monthly %>%
group_by(release_date) %>%
nest() %>%
map(~data %>%
mutate(ntile_score = ntile(score, 2))
)
also does not work:
df_monthly %>%
group_by(release_date) %>%
nest() %>%
map(~data %>%
mutate(ntile_score = ~ntile(.x$score, 2))
)
What I would like to do is to map over the nested data
and compute the ntiles
. I am trying a number of different methods but cannot seem to get it working.
Upvotes: 1
Views: 720
Reputation: 887158
We need to use it inside mutate
or else extract the data
with pull
or .$
if the standalone map
needs to be applied on 'data'
library(dplyr)
library(purrr)
out <- df_monthly %>%
group_by(release_date) %>%
nest %>%
mutate(data = map(data, ~
.x %>%
mutate(ntile_score = ntile(score, 2))))
out
# A tibble: 55 x 2
# Groups: release_date [55]
# release_date data
# <date> <list>
# 1 2010-02-28 <tibble [72 × 6]>
# 2 2010-03-31 <tibble [24 × 6]>
# 3 2010-04-30 <tibble [96 × 6]>
# 4 2010-05-31 <tibble [12 × 6]>
# 5 2010-06-30 <tibble [60 × 6]>
# 6 2010-07-31 <tibble [72 × 6]>
# 7 2010-08-31 <tibble [48 × 6]>
# 8 2010-09-30 <tibble [24 × 6]>
# 9 2010-10-31 <tibble [12 × 6]>
#10 2010-11-30 <tibble [72 × 6]>
# … with 45 more rows
-checking one of the list
elements
out$data[[1]]
# A tibble: 72 x 6
# symbol year date monthly.returns score ntile_score
# <chr> <dbl> <date> <dbl> <dbl> <int>
# 1 PM 2010 2010-01-29 -0.0778 0.450 1
# 2 PM 2010 2010-02-26 0.0762 0.450 1
# 3 PM 2010 2010-03-31 0.0767 0.450 1
# 4 PM 2010 2010-04-30 -0.0590 0.450 1
# 5 PM 2010 2010-05-28 -0.101 0.450 1
# 6 PM 2010 2010-06-30 0.0522 0.450 1
# 7 PM 2010 2010-07-30 0.113 0.450 1
# 8 PM 2010 2010-08-31 0.00627 0.450 1
# 9 PM 2010 2010-09-30 0.103 0.450 1
#10 PM 2010 2010-10-29 0.0444 0.450 1
# … with 62 more rows
Upvotes: 2