Sweepy Dodo
Sweepy Dodo

Reputation: 1873

Storing unique values of each column (of a df) in list

It is straight forward to obtain unique values of a column using unique. However, I am looking to do the same but for multiple columns in a dataframe and store them in a list, all using base R. Importantly, it is not combinations I need but simply unique values for each individual column. I currently have the below:

# dummy data
df = data.frame(a = LETTERS[1:4]
                ,b = 1:4)

# for loop
cols = names(df)
unique_values_by_col = list()
for (i in cols)
{
  x = unique(i)
  unique_values_by_col[[i]] = x
}

The problem comes when displaying unique_values_by_col as it shows as empty. I believe the problem is i is being passed to the loop as a text not a variable. Any help would be greatly appreciated. Thank you.

Upvotes: 3

Views: 1095

Answers (4)

Simon C.
Simon C.

Reputation: 1067

Or you have also apply that is specifically done to be run on column or line:

apply(df,2,unique)

result:

> apply(df,2,unique)
     a   b
[1,] "A" "1"
[2,] "B" "2"
[3,] "C" "3"
[4,] "D" "4"

thought if you want a list lapply return you a list so may be better

Upvotes: 2

NelsonGon
NelsonGon

Reputation: 13319

Could this be what you're trying to do?

Map(unique,df)

Result:

$a
[1] A B C D
Levels: A B C D

$b
[1] 1 2 3 4

Upvotes: 1

s_baldur
s_baldur

Reputation: 33603

Your for loop is almost right, just needs one fix to work:

# for loop
cols = names(df)
unique_values_by_col = list()
for (i in cols) {
  x = unique(df[[i]])
  unique_values_by_col[[i]] = x
}
unique_values_by_col
# $a
# [1] A B C D
# Levels: A B C D
# 
# $b
# [1] 1 2 3 4

i is just a character, the name of a column within df so unique(i) doesn't make sense.


Anyhow, the most standard way for this task is lapply() as shown by demirev.

Upvotes: 1

demirev
demirev

Reputation: 195

Why not avoid the for loop altogether using lapply:

lapply(df, unique)

Resulting in:

> $a
> [1] A B C D
> Levels: A B C D

> $b
> [1] 1 2 3 4

Upvotes: 2

Related Questions