Evelyn Abbott
Evelyn Abbott

Reputation: 95

Assign specific values to unique values in all dataframe columns in R

I've got a dataframe with many columns, and each column has 3 possible values. Additionally, these 3 unique values are not the same for every column and some contain NA. Like so:

df = data.frame(
  "a" = c(13, 33, 11, 33),
  "b" = c(11, 11, 14, 11),
  "c" = c(44, 22, NA, 24)
)
       a  b  c
    1 13 11 44
    2 33 11 22
    3 11 14 NA
    4 33 11 24

Each unique value (per column) should be labeled as 0, 1, or 2: "1" for having both numbers, and "0" or "2" for having two of the same number. And NAs should be kept. Like this:

   a  b  c
1  1  0  0
2  2  0  2
3  0  1  NA
4  2  0  1

The number which is assigned "0" or "2" is not important, provided that it is uniform for the entire column.

Upvotes: 2

Views: 83

Answers (2)

Edward
Edward

Reputation: 19339

sapply(df, \(x) 1+(x%%11==0) - 2*(x==min(x[x%%11==0], na.rm=TRUE)))
     a b  c
[1,] 1 0  2
[2,] 2 0  0
[3,] 0 1 NA
[4,] 2 0  1

If the unique values are always XX, XY, and YY (but never YX) where X<Y, then we can simplify the above to:

sapply(df, \(x) 1+(x%%11==0) - 2*(x==min(x, na.rm=TRUE)))

Upvotes: 5

Darren Tsai
Darren Tsai

Reputation: 35604

You can try this

lapply(df, \(x) {
  z <- sapply(strsplit(as.character(x), ''), \(y) length(unique(y)) == 1)
  ifelse(z, (match(x, levels(factor(x[z]))) - 1) * 2, 1)
}) |> as.data.frame()

#   a b  c
# 1 1 0  2
# 2 2 0  0
# 3 0 1 NA
# 4 2 0  1

Upvotes: 2

Related Questions