elikesprogramming
elikesprogramming

Reputation: 2598

alternative to `str()` in R

Perhaps it is just me, but I have always found str unsatisfactory. It is frequently too verbose, yet not very informative in many occasions.

I actually really like the description of the function (?str):

Compactly display the internal structure of an R object

and this bit in particular

Ideally, only one line for each ‘basic’ structure is displayed.

Only that, in many cases, the default str implementation simply does not do justice to such description.

Ok, let's say it works partially good for data.frames.

library(ggplot2)
str(mpg)

> str(mpg)
Classes ‘tbl_df’, ‘tbl’ and 'data.frame':   234 obs. of  11 variables:
 $ manufacturer: chr  "audi" "audi" "audi" "audi" ...
 $ model       : chr  "a4" "a4" "a4" "a4" ...
 $ displ       : num  1.8 1.8 2 2 2.8 2.8 3.1 1.8 1.8 2 ...
 $ year        : int  1999 1999 2008 2008 1999 1999 2008 1999 1999 2008 ...
 $ cyl         : int  4 4 4 4 6 6 6 4 4 4 ...
 $ trans       : chr  "auto(l5)" "manual(m5)" "manual(m6)" "auto(av)" ...
 $ drv         : chr  "f" "f" "f" "f" ...
 $ cty         : int  18 21 20 21 16 18 18 18 16 20 ...
 $ hwy         : int  29 29 31 30 26 26 27 26 25 28 ...
 $ fl          : chr  "p" "p" "p" "p" ...
 $ class       : chr  "compact" "compact" "compact" "compact" ...

Yet, for a data.frame it's not as informative as I would like. In addition to class, it would be very useful that it shows number of NA values, and number of unique values, for example.

But for other objects, it quickly becomes unmanageable. For example:

gp <- ggplot(mpg, aes(x = displ, y = hwy)) +
    geom_point()
str(gp)

> str(gp)
List of 9
 $ data       :Classes ‘tbl_df’, ‘tbl’ and 'data.frame':    234 obs. of  11 variables:
  ..$ manufacturer: chr [1:234] "audi" "audi" "audi" "audi" ...
  ..$ model       : chr [1:234] "a4" "a4" "a4" "a4" ...
  ..$ displ       : num [1:234] 1.8 1.8 2 2 2.8 2.8 3.1 1.8 1.8 2 ...
  ..$ year        : int [1:234] 1999 1999 2008 2008 1999 1999 2008 1999 1999 2008 ...
  ..$ cyl         : int [1:234] 4 4 4 4 6 6 6 4 4 4 ...
  ..$ trans       : chr [1:234] "auto(l5)" "manual(m5)" "manual(m6)" "auto(av)" ...
  ..$ drv         : chr [1:234] "f" "f" "f" "f" ...
  ..$ cty         : int [1:234] 18 21 20 21 16 18 18 18 16 20 ...
  ..$ hwy         : int [1:234] 29 29 31 30 26 26 27 26 25 28 ...
  ..$ fl          : chr [1:234] "p" "p" "p" "p" ...
  ..$ class       : chr [1:234] "compact" "compact" "compact" "compact" ...
 $ layers     :List of 1
  ..$ :Classes 'LayerInstance', 'Layer', 'ggproto' <ggproto object: Class LayerInstance, Layer>
    aes_params: list
    compute_aesthetics: function
    compute_geom_1: function
    compute_geom_2: function
    compute_position: function
    compute_statistic: function
    data: waiver
    draw_geom: function
    geom: <ggproto object: Class GeomPoint, Geom>
        aesthetics: function
        default_aes: uneval
        draw_group: function
        draw_key: function
        draw_layer: function
        draw_panel: function
        extra_params: na.rm
        handle_na: function
        non_missing_aes: size shape
        parameters: function
        required_aes: x y
        setup_data: function
        use_defaults: function
        super:  <ggproto object: Class Geom>
    geom_params: list
    inherit.aes: TRUE
    layer_data: function
    map_statistic: function
    mapping: NULL
    position: <ggproto object: Class PositionIdentity, Position>
        compute_layer: function
        compute_panel: function
        required_aes: 
        setup_data: function
        setup_params: function
        super:  <ggproto object: Class Position>
    print: function
    show.legend: NA
    stat: <ggproto object: Class StatIdentity, Stat>
        compute_group: function
        compute_layer: function
        compute_panel: function
        default_aes: uneval
        extra_params: na.rm
        non_missing_aes: 
        parameters: function
        required_aes: 
        retransform: TRUE
        setup_data: function
        setup_params: function
        super:  <ggproto object: Class Stat>
    stat_params: list
    subset: NULL
    super:  <ggproto object: Class Layer> 
 $ scales     :Classes 'ScalesList', 'ggproto' <ggproto object: Class ScalesList>
    add: function
    clone: function
    find: function
    get_scales: function
    has_scale: function
    input: function
    n: function
    non_position_scales: function
    scales: list
    super:  <ggproto object: Class ScalesList> 
 $ mapping    :List of 2
  ..$ x: symbol displ
  ..$ y: symbol hwy
 $ theme      : list()
 $ coordinates:Classes 'CoordCartesian', 'Coord', 'ggproto' <ggproto object: Class CoordCartesian, Coord>
    aspect: function
    distance: function
    expand: TRUE
    is_linear: function
    labels: function
    limits: list
    range: function
    render_axis_h: function
    render_axis_v: function
    render_bg: function
    render_fg: function
    train: function
    transform: function
    super:  <ggproto object: Class CoordCartesian, Coord> 
 $ facet      :List of 1
  ..$ shrink: logi TRUE
  ..- attr(*, "class")= chr [1:2] "null" "facet"
 $ plot_env   :<environment: R_GlobalEnv> 
 $ labels     :List of 2
  ..$ x: chr "displ"
  ..$ y: chr "hwy"
 - attr(*, "class")= chr [1:2] "gg" "ggplot"

Whaaattttt???, what happened to "Compactly display". That's not compact!

And it can be worse, crazy scary, for example, for S4 objects. If you want try this:

library(rworldmap)
newmap <- getMap(resolution = "coarse")
str(newmap)

I do not post the output here because it is too much. It does not even fit in the console buffer!

How can you possibly understand the internal structure of the object with such a NON-compact display? It's just too many details and you easily get lost. Or at least I do.

Well, all right. Before someone tells me, hey checkout ?str and tweak the arguments, that's what I did. Of course it can get better, but I am still kind of disappointed with str.

The best solution I've got is to create a function that do this

if(isS4(obj)){
    str(obj, max.level = 2, give.attr = FALSE, give.head = FALSE)
} else {
    str(obj, max.level = 1, give.attr = FALSE, give.head = FALSE)
}

This displays compactly the top level structures of the object. The output for the sp object above (S4 object) becomes much more insightful

Formal class 'SpatialPolygonsDataFrame' [package "sp"] with 5 slots
  ..@ data       :'data.frame': 243 obs. of  49 variables:
  ..@ polygons   :List of 243
  .. .. [list output truncated]
  ..@ plotOrder  :7 135 28 167 31 23 9 66 84 5 ...
  ..@ bbox       :-180 -90 180 83.6
  ..@ proj4string:Formal class 'CRS' [package "sp"] with 1 slot

So now you can see there are 5 top level structures, and you can investigate them further individually.

Similar for the ggplot object above, now you can see

List of 9
 $ data       :Classes ‘tbl_df’, ‘tbl’ and 'data.frame':    234 obs. of  11 variables:
 $ layers     :List of 1
 $ scales     :Classes 'ScalesList', 'ggproto' 
 $ mapping    :List of 2
 $ theme      : list()
 $ coordinates:Classes 'CoordCartesian', 'Coord', 'ggproto' 
 $ facet      :List of 1
 $ plot_env   :
 $ labels     :List of 2

Although this is much better, I still feel it could be much more insightful. So, perhaps someone has felt the same way and created a nice function that is more informative and still compactly displays the information. Anyone?

Upvotes: 8

Views: 5286

Answers (2)

shs
shs

Reputation: 3899

There is the lobstr package by Hadley. Besides several other more or less helpful functions it includes lobstr::tree() which tries to be more predictable, compact and overall more helpful than str().

An important difference between the two is that str() is an S3 generic whereas lobstr::tree() is not. That means package developers can and will include their own methods for str() which can substantially improve the usefulness of str(). But it also means that str() output can be very inconsistent.

For comparison, here is a display of the structure of a simple lm() with both functions. lobstr::tree() also prints a colorized output, which improves legibility further, but you obviously can't see the colors here on SO. Note in particular the much more concise and useful parts of the formula and the data frame items:

m <- lm(mpg~cyl, mtcars)

lobstr::tree(m)
#> S3<lm>
#> ├─coefficients<dbl [2]>: 37.8845764854614, -2.87579013906448
#> ├─residuals<dbl [32]>: 0.370164348925359, 0.370164348925418, -3.58141592920354, 0.770164348925411, 3.82174462705436, -2.52983565107459, -0.578255372945636, -1.98141592920354, -3.58141592920354, -1.42983565107459, ...
#> ├─effects<dbl [32]>: -113.649737406208, -28.5956806590543, -3.70425398161014, 0.709596949580206, 3.82344788077055, -2.59040305041979, -0.576552119229446, -2.10425398161014, -3.70425398161014, -1.49040305041979, ...
#> ├─rank: 2
#> ├─fitted.values<dbl [32]>: 20.6298356510746, 20.6298356510746, 26.3814159292035, 20.6298356510746, 14.8782553729456, 20.6298356510746, 14.8782553729456, 26.3814159292035, 26.3814159292035, 20.6298356510746, ...
#> ├─assign<int [2]>: 0, 1
#> ├─qr: S3<qr>
#> │ ├─qr<dbl [64]>: -5.65685424949238, 0.176776695296637, 0.176776695296637, 0.176776695296637, 0.176776695296637, 0.176776695296637, 0.176776695296637, 0.176776695296637, 0.176776695296637, 0.176776695296637, ...
#> │ ├─qraux<dbl [2]>: 1.17677669529664, 1.01602374277435
#> │ ├─pivot<int [2]>: 1, 2
#> │ ├─tol: 1e-07
#> │ └─rank: 2
#> ├─df.residual: 30
#> ├─xlevels: <list>
#> ├─call: <language> lm(formula = mpg ~ cyl, data = mtcars)
#> ├─terms: S3<terms/formula> mpg ~ cyl
#> └─model: S3<data.frame>
#>   ├─mpg<dbl [32]>: 21, 21, 22.8, 21.4, 18.7, 18.1, 14.3, 24.4, 22.8, 19.2, ...
#>   └─cyl<dbl [32]>: 6, 6, 4, 6, 8, 6, 8, 4, 4, 6, ...
str(m)
#> List of 12
#>  $ coefficients : Named num [1:2] 37.88 -2.88
#>   ..- attr(*, "names")= chr [1:2] "(Intercept)" "cyl"
#>  $ residuals    : Named num [1:32] 0.37 0.37 -3.58 0.77 3.82 ...
#>   ..- attr(*, "names")= chr [1:32] "Mazda RX4" "Mazda RX4 Wag" "Datsun 710" "Hornet 4 Drive" ...
#>  $ effects      : Named num [1:32] -113.65 -28.6 -3.7 0.71 3.82 ...
#>   ..- attr(*, "names")= chr [1:32] "(Intercept)" "cyl" "" "" ...
#>  $ rank         : int 2
#>  $ fitted.values: Named num [1:32] 20.6 20.6 26.4 20.6 14.9 ...
#>   ..- attr(*, "names")= chr [1:32] "Mazda RX4" "Mazda RX4 Wag" "Datsun 710" "Hornet 4 Drive" ...
#>  $ assign       : int [1:2] 0 1
#>  $ qr           :List of 5
#>   ..$ qr   : num [1:32, 1:2] -5.657 0.177 0.177 0.177 0.177 ...
#>   .. ..- attr(*, "dimnames")=List of 2
#>   .. .. ..$ : chr [1:32] "Mazda RX4" "Mazda RX4 Wag" "Datsun 710" "Hornet 4 Drive" ...
#>   .. .. ..$ : chr [1:2] "(Intercept)" "cyl"
#>   .. ..- attr(*, "assign")= int [1:2] 0 1
#>   ..$ qraux: num [1:2] 1.18 1.02
#>   ..$ pivot: int [1:2] 1 2
#>   ..$ tol  : num 1e-07
#>   ..$ rank : int 2
#>   ..- attr(*, "class")= chr "qr"
#>  $ df.residual  : int 30
#>  $ xlevels      : Named list()
#>  $ call         : language lm(formula = mpg ~ cyl, data = mtcars)
#>  $ terms        :Classes 'terms', 'formula'  language mpg ~ cyl
#>   .. ..- attr(*, "variables")= language list(mpg, cyl)
#>   .. ..- attr(*, "factors")= int [1:2, 1] 0 1
#>   .. .. ..- attr(*, "dimnames")=List of 2
#>   .. .. .. ..$ : chr [1:2] "mpg" "cyl"
#>   .. .. .. ..$ : chr "cyl"
#>   .. ..- attr(*, "term.labels")= chr "cyl"
#>   .. ..- attr(*, "order")= int 1
#>   .. ..- attr(*, "intercept")= int 1
#>   .. ..- attr(*, "response")= int 1
#>   .. ..- attr(*, ".Environment")=<environment: R_GlobalEnv> 
#>   .. ..- attr(*, "predvars")= language list(mpg, cyl)
#>   .. ..- attr(*, "dataClasses")= Named chr [1:2] "numeric" "numeric"
#>   .. .. ..- attr(*, "names")= chr [1:2] "mpg" "cyl"
#>  $ model        :'data.frame':   32 obs. of  2 variables:
#>   ..$ mpg: num [1:32] 21 21 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 ...
#>   ..$ cyl: num [1:32] 6 6 4 6 8 6 8 4 4 6 ...
#>   ..- attr(*, "terms")=Classes 'terms', 'formula'  language mpg ~ cyl
#>   .. .. ..- attr(*, "variables")= language list(mpg, cyl)
#>   .. .. ..- attr(*, "factors")= int [1:2, 1] 0 1
#>   .. .. .. ..- attr(*, "dimnames")=List of 2
#>   .. .. .. .. ..$ : chr [1:2] "mpg" "cyl"
#>   .. .. .. .. ..$ : chr "cyl"
#>   .. .. ..- attr(*, "term.labels")= chr "cyl"
#>   .. .. ..- attr(*, "order")= int 1
#>   .. .. ..- attr(*, "intercept")= int 1
#>   .. .. ..- attr(*, "response")= int 1
#>   .. .. ..- attr(*, ".Environment")=<environment: R_GlobalEnv> 
#>   .. .. ..- attr(*, "predvars")= language list(mpg, cyl)
#>   .. .. ..- attr(*, "dataClasses")= Named chr [1:2] "numeric" "numeric"
#>   .. .. .. ..- attr(*, "names")= chr [1:2] "mpg" "cyl"
#>  - attr(*, "class")= chr "lm"

Created on 2022-11-23 with reprex v2.0.2

Upvotes: 1

pauljeba
pauljeba

Reputation: 770

In such situation I use glimpse from the tibble package which is less verbose and briefly descriptive of the data structure.

library(tibble)
glimpse(gp)

Upvotes: 10

Related Questions