Reputation: 3768
set.seed(8)
df <- data.frame(n = rnorm(5,1), m = rnorm(5,0), l = factor(LETTERS[1:5]))
Have can I make a new column in df
conditioned on values or combination of values of n, m and l.
For instance make a vector level
and assign it low
, medium
and high
based on values of both n
and m
(pseudo-code):
df$level <- ifelse(df$n < 1 & df$m < 1, "low", ifelse(df$n > 1 & df$m > 1, "high", "medium")
This should give:
df$level
#low medium low low medium
Or if I would like to assign a value to level
based on the l
column and a value in n
(again, pseudo-code):
df$level <- ifelse(df$n < 1 & df$l == c("A", "B"), "low A/B", "high").
In this case one should get:
df$level
#"low A/B" "high" "high" "high" "high"
Upvotes: 1
Views: 16905
Reputation: 887048
You could also do:
c("high", "medium", "low")[rowSums(df[,-3] <1)+1]
#[1] "low" "medium" "low" "low" "medium"
c("high", "low A/B")[(df$n <1 &grepl("A|B", df$l)) +1]
#[1] "low A/B" "high" "high" "high" "high"
df[,-3]
gets the subset of numeric columns i.e. n
and m
df[,-3] <1
gives a logical index of TRUE
, FALSE
if the element is <1
or not.By doing rowSums
on the above, it gives three possible values - 0, 1, 2 based on whether the corresponding values in each row are both >1, one value <1, and both <1.
rowSums(df[,-3] <1) #in this example, there are no values equal to 0
#[1] 2 1 2 2 1
+1
to the above will give us
rowSums(df[,-3] <1) +1
#[1] 3 2 3 3 2
Using the above as numeric index, we can do:
c("high", "medium", "low")[rowSums(df[,-3] <1)+1]
#[1] "low" "medium" "low" "low" "medium"
low
will occupy the places of numeric value 3
, medium
on 2
and if there was 1, high
should occupy that.
Upvotes: 2
Reputation: 44638
More of an extended comment than an answer, and perhaps not exactly what you're looking for.
Usually, when I need to capture groups of continuous variables and convert them to a single categorical variable, I use clustering and title the clusters according to the values presented. Here's an example using kmeans:
set.seed(8)
df <- data.frame(n = rnorm(5000,1), m = rnorm(5000,0), l = factor(LETTERS[1:5]))
df$Category <- kmeans(df[1:2],7)$cluster
kmeans(df[1:2],7)
K-means clustering with 7 clusters of sizes 593, 606, 649, 626, 641, 1219, 666
Cluster means:
n m
1 -0.2097451 0.84837728 # Low-High
2 1.0977826 1.44383531 # Mid-Upper
3 2.1682482 -0.70983193 # High-Low
4 -0.3389432 -0.54514302 # Low-Low
5 2.3332772 0.67415808 # High-Mid
6 0.9816709 -0.01549909 # Upper-Mid
7 0.8859904 -1.46126667 # Mid-Low
df$Category <- factor(df$Category, c("Low-High","Mid-Upper","High-Low","Low-Low",...))
You would have to look at the mean results of the clusters on your own computer (with seed) to be able to label them appropriately. This will also provide you with groupings based on your data rather than an arbitrary threshold that you believe is correct for your data.
Upvotes: 0
Reputation: 81683
Here's a solution:
df$level1 <- c("low", "medium", "high")[rowMeans(sign(df[c("n", "m")] - 1)) + 2]
df$level2 <- c("high", "low A/B")[(df$n < 1 & df$l %in% c("A", "B")) + 1]
# n m l level1 level2
# 1 0.9154139 -0.1078814 A low low A/B
# 2 1.8404001 -0.1702891 B medium high
# 3 0.5365172 -1.0883317 C low high
# 4 0.4491650 -3.0110517 D low high
# 5 1.7360404 -0.5931743 E medium high
Upvotes: 3
Reputation: 1262
I'm probably missing the question, but when I add a missing closing parenthesis, it seems to work just fine:
> df$level <- ifelse(df$n < 1 & df$m < 1, "low", ifelse(df$n > 1 & df$m > 1, "high", "medium"))
> df
n m l level
1 0.9154139 -0.1078814 A low
2 1.8404001 -0.1702891 B medium
3 0.5365172 -1.0883317 C low
4 0.4491650 -3.0110517 D low
5 1.7360404 -0.5931743 E medium
> df$level
[1] "low" "medium" "low" "low" "medium"
Upvotes: 1