Kit_ri
Kit_ri

Reputation: 39

How can create my own factor column in a dataframe?

I have dataframe and task:"Define your own criterion of income level, and split data according to levels of this criterion"

dput(head(creditcard))
structure(list(card = structure(c(2L, 2L, 2L, 2L, 2L, 2L), levels = c("no", 
"yes"), class = "factor"), reports = c(0L, 0L, 0L, 0L, 0L, 0L
), age = c(37.66667, 33.25, 33.66667, 30.5, 32.16667, 23.25), 
    income = c(4.52, 2.42, 4.5, 2.54, 9.7867, 2.5), share = c(0.03326991, 
    0.005216942, 0.004155556, 0.06521378, 0.06705059, 0.0444384
    ), expenditure = c(124.9833, 9.854167, 15, 137.8692, 546.5033, 
    91.99667), owner = structure(c(2L, 1L, 2L, 1L, 2L, 1L), levels = c("no", 
    "yes"), class = "factor"), selfemp = structure(c(1L, 1L, 
    1L, 1L, 1L, 1L), levels = c("no", "yes"), class = "factor"), 
    dependents = c(3L, 3L, 4L, 0L, 2L, 0L), days = c(54L, 34L, 
    58L, 25L, 64L, 54L), majorcards = c(1L, 1L, 1L, 1L, 1L, 1L
    ), active = c(12L, 13L, 5L, 7L, 5L, 1L), income_fam = c(1.13, 
    0.605, 0.9, 2.54, 3.26223333333333, 2.5)), row.names = c("1", 
"2", "3", "4", "5", "6"), class = "data.frame")

enter image description here

I defined this criterion in this way

inc_l<-c("low","average","above average","high")
grad_fact<-function(x){
  ifelse(x>=10, 'high',
         ifelse(x>6 && x<10, 'above average',
                ifelse(x>=3 && x<=6,'average',
                       ifelse(x<3, 'low'))))
}

And added a column like this

creditcard<-transform(creditcard, incom_levev=factor(sapply(creditcard$income, grad_fact), inc_l, ordered = TRUE))

But I need not to use saaply for this and I tried to do it in this way

creditcard<-transform(creditcard, incom_level=factor(grad_fact(creditcard$income),inc_l, ordered = TRUE))

But in this case, all the elements of the column take the value "average" and I don't understand why, please help me figure out the problem

Upvotes: 1

Views: 26

Answers (1)

akrun
akrun

Reputation: 887048

We may need to change the && to & as && will return a single TRUE/FALSE. According to ?"&&"

& and && indicate logical AND and | and || indicate logical OR. The shorter forms performs elementwise comparisons in much the same way as arithmetic operators. The longer forms evaluates left to right, proceeding only until the result is determined. The longer form is appropriate for programming control-flow and typically preferred in if clauses.

In addition, the last ifelse didn't had a no case

grad_fact<-function(x){
  ifelse(x>=10, 'high',
         ifelse(x>6 & x<10, 'above average',
                ifelse(x>=3 & x<=6,'average',
                       ifelse(x<3, 'low', NA_character_))))
}

and then use

creditcard <- transform(creditcard, incom_level=
       factor(grad_fact(income),inc_l, ordered = TRUE))

-output

creditcard
card reports      age income       share expenditure owner selfemp dependents days majorcards active income_fam   incom_level
1  yes       0 37.66667 4.5200 0.033269910  124.983300   yes      no          3   54          1     12   1.130000       average
2  yes       0 33.25000 2.4200 0.005216942    9.854167    no      no          3   34          1     13   0.605000           low
3  yes       0 33.66667 4.5000 0.004155556   15.000000   yes      no          4   58          1      5   0.900000       average
4  yes       0 30.50000 2.5400 0.065213780  137.869200    no      no          0   25          1      7   2.540000           low
5  yes       0 32.16667 9.7867 0.067050590  546.503300   yes      no          2   64          1      5   3.262233 above average
6  yes       0 23.25000 2.5000 0.044438400   91.996670    no      no          0   54          1      1   2.500000           low

Upvotes: 0

Related Questions