user3443063
user3443063

Reputation: 1615

R get all categories in column

I have a large Dataset (dataframe) where I want to find the number and the names of my cartegories in a column.

For example my df was like that:

 A   B   
 1   car
 2   car
 3   bus
 4   car
 5   plane 
 6   plane 
 7   plane 
 8   plane 
 9   plane 
 10   train

I would want to find :

  car
  bus
  plane
  train
  4

How would I do that?

Upvotes: 7

Views: 85476

Answers (6)

Rachael_Adl
Rachael_Adl

Reputation: 1

Firstly you must ensure that your column is in the correct data type. Most probably R had read it in as a 'chr' which you can check with 'str(df)'. For the data you have provided as an example, you will want to change this to a 'factor'. df$column <- as.factor(df$column) Once the data is in the correct format, you can then use 'levels(df$column)' to get a summary of levels you have in the dataset

Upvotes: 0

V C
V C

Reputation: 39

Additionally, to see sorted values you can use the following:

sort(table(df$B), decreasing = TRUE)

And you will see the values in the decreasing order.

Upvotes: 1

Rich Scriven
Rich Scriven

Reputation: 99331

I would recommend you use factors here, if you are not already. It's straightforward and simple.

levels() gives the unique categories and nlevels() gives the number of them. If we run droplevels() on the data first, we take care of any levels that may no longer be in the data.

with(droplevels(df), list(levels = levels(B), nlevels = nlevels(B)))
# $levels
# [1] "bus"   "car"   "plane" "train"
#
# $nlevels
# [1] 4

Upvotes: 2

user8552923
user8552923

Reputation:

This gives unique, length of unique, and frequency:

table(df$B)
bus   car plane train 
1     3     5     1

length(table(x$B))
[1] 4

Upvotes: 13

CCD
CCD

Reputation: 610

categories <- unique(yourDataFrame$yourColumn) 
numberOfCategories <- length(categories)

Pretty painless.

Upvotes: 27

sconfluentus
sconfluentus

Reputation: 4993

You can simply use unique:

x <- unique(df$B)

And it will extract the unique values in the column. You can use it with apply to get them from each column too!

Upvotes: 10

Related Questions