atsyplenkov
atsyplenkov

Reputation: 1304

Range to histogram

I'm trying to build a histogram from my data. It's look like this: a data frame where in each row a data range. I need to get the histogram of all values in my df.

year <- c("1925:2002",
          "2008",
          "1925:2002",
          "1925:2002",
          "1925:2002",
          "2008:2013",
          "1934",
          "1972:1988")

All I was able to figure out is to convert every string to a sequence with seq() but it doesn't work properly

for (i in 1:length(year)) {
  rr[i] <- seq(
    as.numeric(unlist(strsplit(year[i], ":"))[1]),
    as.numeric(unlist(strsplit(year[i], ":"))[2])
  )
}

Here is an examplebase histogram

Upvotes: 0

Views: 92

Answers (2)

hrbrmstr
hrbrmstr

Reputation: 78792

Tick the answer box for @MrFlick. I had done this at the same time and the only difference is the piping:

library(magrittr)

strsplit(year, ":") %>% 
  lapply(as.integer) %>% 
  lapply(function(x) seq(x[1], x[length(x)])) %>% 
  unlist() %>% 
  hist()

Full-on tidyverse:

library(tidyverse)

str_split(year, ":") %>%
  map(as.integer) %>% 
  map(~seq(.x[1], .x[length(.x)])) %>% 
  flatten_int() %>% 
  hist()

To defend my comments hence any tidyverse 4eva folks join in the fray:

library(tidyverse)
library(microbenchmark)

microbenchmark(
  base = as.integer(
    unlist(
      lapply(
        lapply(
          strsplit(year, ":"),
          as.integer
        ),
        function(x) seq(x[1], x[length(x)])
      ),
      use.names = FALSE
    )
  ),
  tidy = str_split(year, ":") %>%
    map(as.integer) %>% 
    map(~seq(.x[1], .x[length(.x)])) %>% 
    flatten_int()
)
## Unit: microseconds
##  expr     min      lq     mean   median       uq      max neval
##  base  89.099  96.699 132.1684 102.5895 110.7165 2895.428   100
##  tidy 631.817 647.812 672.5904 667.8250 686.2740  909.531   100

Upvotes: 2

MrFlick
MrFlick

Reputation: 206167

This is one way to split your years up.

years <- unlist(lapply(strsplit(year, ":"), function(x) {
  x <- as.numeric(x)
  if (length(x)==2) {
    return(seq(x[1], x[2]))
  } else {
    return(x)
  }
}))
hist(years)

First we do the splitting, then we either expand it as a sequence or return the numeric value, and finally we unlist() everything to get a simple vector back.

enter image description here

Upvotes: 2

Related Questions