Reputation: 1615
I have a large Dataset (dataframe) where I want to find the number and the names of my cartegories in a column.
For example my df was like that:
A B
1 car
2 car
3 bus
4 car
5 plane
6 plane
7 plane
8 plane
9 plane
10 train
I would want to find :
car
bus
plane
train
4
How would I do that?
Upvotes: 7
Views: 85476
Reputation: 1
Firstly you must ensure that your column is in the correct data type. Most probably R had read it in as a 'chr' which you can check with 'str(df)'. For the data you have provided as an example, you will want to change this to a 'factor'. df$column <- as.factor(df$column) Once the data is in the correct format, you can then use 'levels(df$column)' to get a summary of levels you have in the dataset
Upvotes: 0
Reputation: 39
Additionally, to see sorted values you can use the following:
sort(table(df$B), decreasing = TRUE)
And you will see the values in the decreasing order.
Upvotes: 1
Reputation: 99331
I would recommend you use factors here, if you are not already. It's straightforward and simple.
levels()
gives the unique categories and nlevels()
gives the number of them. If we run droplevels()
on the data first, we take care of any levels that may no longer be in the data.
with(droplevels(df), list(levels = levels(B), nlevels = nlevels(B)))
# $levels
# [1] "bus" "car" "plane" "train"
#
# $nlevels
# [1] 4
Upvotes: 2
Reputation:
This gives unique, length of unique, and frequency:
table(df$B)
bus car plane train
1 3 5 1
length(table(x$B))
[1] 4
Upvotes: 13
Reputation: 610
categories <- unique(yourDataFrame$yourColumn)
numberOfCategories <- length(categories)
Pretty painless.
Upvotes: 27
Reputation: 4993
You can simply use unique:
x <- unique(df$B)
And it will extract the unique values in the column. You can use it with apply to get them from each column too!
Upvotes: 10