Parseltongue
Parseltongue

Reputation: 11657

Create new set of variables equal to the level of a factor in dplyr

I have a data.frame with 100 columns that follow the convention word and word_answer

df <- data.frame(apple = "57%", apple_answer = "22%", dog = "82%", dog_answer = "16%")

I set the levels of the two above factor variables like so:

levels(df$apple) <- c( "66%","57%","48%","39%","30%","22%","12%" )
levels(df$dog) <- c( "82%","71%","60%","49%","38%","27%","16%" )

I'm trying to compute a distance score that is the distance between the numeric level of a factor of a word and the numeric level of its corresponding word_answer.

So, for example, in the case of the "apple" answer, the first row for apple is "57%", which is the 2nd factor level in that factor

> which(levels(df$apple) == "57%")
[1] 2

The corresponding apple_answer column has a factor level of 6

> which(levels(df$apple) == "22%")
[1] 6

So the distance score in this case would be 2-6 = -4

How can I compute these distance scores for every variable in my dataset?

Upvotes: 1

Views: 455

Answers (2)

Haci Duru
Haci Duru

Reputation: 456

You can also use the apply function, like this:

df$apple_dist = apply(df[,1:2], 1, function(x) {
    which(levels(df$apple) == x[1]) - which(levels(df$apple) == x[2])
})

df$dog_dist = apply(df[,3:4], 1, function(x) {
    which(levels(df$dog) == x[1]) - which(levels(df$dog) == x[2])
})

> df
  apple apple_answer dog dog_answer apple_dist dog_dist
1   57%          22% 82%        16%         -4       -6

Upvotes: 1

Ronak Shah
Ronak Shah

Reputation: 388817

You can divide data in two groups, word and their corresponding answers. Use match to get their position and subtract from each value and generate new columns.

answer_cols <- grep('_answer', names(df))
new_cols <- paste0(names(df)[-answer_cols], '_dist')

df[new_cols] <- Map(function(x, y) match(x, levels(x)) - match(y, levels(x)),
                                     df[-answer_cols], df[answer_cols])

df
#  apple apple_answer dog dog_answer apple_dist dog_dist
#1   57%          22% 82%        16%         -4       -6

Upvotes: 1

Related Questions