Ben
Ben

Reputation: 1506

How do you get structure of data frame with limited length for variable names?

I have a data frame for a raw data set where the variable names are extremely long. I would like to display the structure of the data frame using the str function, and impose a character limit on the displayed variable names, so that it is easier to read.

Here is a reproducible example of the kind of thing I am talking about.

#Data frame with long names
set.seed(1);
DATA <- data.frame(ID = 1:50,
                   Value = rnorm(50),
                   This_variable_has_a_really_long_and_annoying_name_to_illustrate_the_problem_of_a_data_frame_with_a_long_and_annoying_name = runif(50));

#Show structure of DATA
str(DATA);

> str(DATA)
'data.frame':   50 obs. of  3 variables:
 $ ID                                                                                                                       : int  1 2 3 4 5 6 7 8 9 10 ...
 $ Value                                                                                                                    : num  -0.626 0.184 -0.836 1.595 0.33 ...
 $ This_variable_has_a_really_long_and_annoying_name_to_illustrate_the_problem_of_a_data_frame_with_a_long_and_annoying_name: num  0.655 0.353 0.27 0.993 0.633 ...

I would like to use the str function but impose an upper limit on the number of characters to display in the variable names, so that I get output that is something like the one below. I have read the documentation, but I have not been able to identify if there is an option to do this. (There seem to be options to impose upper limits on the lengths of strings in the data, but I cannot see an option to impose a limit on the length of the variable name.)

'data.frame':   50 obs. of  3 variables:
 $ ID                   : int  1 2 3 4 5 6 7 8 9 10 ...
 $ Value                : num  -0.626 0.184 -0.836 1.595 0.33 ...
 $ This_variable_has... : num  0.655 0.353 0.27 0.993 0.633 ...

Question: Is there a simple way to get the structure of the data frame, but imposing a limitation on the length of the variable names (to get output something like the above)?

Upvotes: 1

Views: 199

Answers (1)

lroha
lroha

Reputation: 34601

As far as I can see you're right, there doesn't seem to be a built in means to control this. You also can't do it after the fact because str() doesn't return anything. So the easiest option seems to be renaming beforehand. Relying on setNames(), you could create a simple function to accomplish this:

short_str <- function(data, n = 20, ...) {
  name_vec <- names(data)
  str(setNames(data, ifelse(
    nchar(name_vec) > n, paste0(substring(name_vec, 1, n - 4), "... "), name_vec
  )), ...)
}

short_str(DATA)

'data.frame':   50 obs. of  3 variables:
 $ ID                   : int  1 2 3 4 5 6 7 8 9 10 ...
 $ Value                : num  -0.626 0.184 -0.836 1.595 0.33 ...
 $ This_variable_has... : num  0.655 0.353 0.27 0.993 0.633 ...

Upvotes: 2

Related Questions