Reputation: 1304
I'm trying to build a histogram
from my data. It's look like this: a data frame
where in each row a data range. I need to get the histogram of all values in my df
.
year <- c("1925:2002",
"2008",
"1925:2002",
"1925:2002",
"1925:2002",
"2008:2013",
"1934",
"1972:1988")
All I was able to figure out is to convert every string to a sequence with seq()
but it doesn't work properly
for (i in 1:length(year)) {
rr[i] <- seq(
as.numeric(unlist(strsplit(year[i], ":"))[1]),
as.numeric(unlist(strsplit(year[i], ":"))[2])
)
}
Upvotes: 0
Views: 92
Reputation: 78792
Tick the answer box for @MrFlick. I had done this at the same time and the only difference is the piping:
library(magrittr)
strsplit(year, ":") %>%
lapply(as.integer) %>%
lapply(function(x) seq(x[1], x[length(x)])) %>%
unlist() %>%
hist()
Full-on tidyverse
:
library(tidyverse)
str_split(year, ":") %>%
map(as.integer) %>%
map(~seq(.x[1], .x[length(.x)])) %>%
flatten_int() %>%
hist()
To defend my comments hence any tidyverse
4eva folks join in the fray:
library(tidyverse)
library(microbenchmark)
microbenchmark(
base = as.integer(
unlist(
lapply(
lapply(
strsplit(year, ":"),
as.integer
),
function(x) seq(x[1], x[length(x)])
),
use.names = FALSE
)
),
tidy = str_split(year, ":") %>%
map(as.integer) %>%
map(~seq(.x[1], .x[length(.x)])) %>%
flatten_int()
)
## Unit: microseconds
## expr min lq mean median uq max neval
## base 89.099 96.699 132.1684 102.5895 110.7165 2895.428 100
## tidy 631.817 647.812 672.5904 667.8250 686.2740 909.531 100
Upvotes: 2
Reputation: 206167
This is one way to split your years up.
years <- unlist(lapply(strsplit(year, ":"), function(x) {
x <- as.numeric(x)
if (length(x)==2) {
return(seq(x[1], x[2]))
} else {
return(x)
}
}))
hist(years)
First we do the splitting, then we either expand it as a sequence or return the numeric value, and finally we unlist()
everything to get a simple vector back.
Upvotes: 2