Reputation: 1118
With numeric columns and factor columns, summary()
provides some information useful in understanding the data. For example, this output using the iris
dataset:
Here, we see min, 1st quartile, median, mean, 3rd quartile, and max for the numeric columns, which is helpful for a quick spot-check. We also see counts on the factor column.
Running the following code just to create an all-character-column data frame and checking summary()
, we get a result that isn't very helpful as a summary of the values in my data (at least for the purposes that I'm interested in).
iris2<-iris%>%
mutate_all(as.character)
summary(iris2)
In general, I'd like to have something more like the results I get for factor columns when I use summary()
with character columns.
I realize that I can convert my character columns to factor and then run summary()
with something like the below:
iris3<-iris2%>%
mutate_all(as.factor)
summary(iris3)
Is there a way that I can avoid having to make the extra step in order to spot-check my data? I ultimately want to keep working with the data as character columns rather than factor, and would prefer not to have to switch back and forth between the data types. It wouldn't matter to me if this conversion is happening "behind the scenes". For what it's worth, an expanded summary()
in the case of the numeric columns that included some of the high-frequency values would be interesting as well. Thank you in advance for any help in finding a way.
Upvotes: 2
Views: 5228
Reputation: 887911
If it is to get an overall summary of the dataset, skim
may be useful
skimr::skim(iris)
-output
── Data Summary ────────────────────────
Values
Name iris
Number of rows 150
Number of columns 5
_______________________
Column type frequency:
factor 1
numeric 4
________________________
Group variables None
── Variable type: factor ────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
skim_variable n_missing complete_rate ordered n_unique top_counts
1 Species 0 1 FALSE 3 set: 50, ver: 50, vir: 50
── Variable type: numeric ───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
skim_variable n_missing complete_rate mean sd p0 p25 p50 p75 p100 hist
1 Sepal.Length 0 1 5.84 0.828 4.3 5.1 5.8 6.4 7.9 ▆▇▇▅▂
2 Sepal.Width 0 1 3.06 0.436 2 2.8 3 3.3 4.4 ▁▆▇▂▁
3 Petal.Length 0 1 3.76 1.77 1 1.6 4.35 5.1 6.9 ▇▁▆▇▂
4 Petal.Width 0 1 1.20 0.762 0.1 0.3 1.3 1.8 2.5 ▇▁▇▅▃
Upvotes: 1