Student Work
Student Work

Reputation: 31

Issues replacing value in R

I am trying to replace some values for a variable within my data set but I keep getting an unexpected value of 414 assigned instead of 9. I've been over the code a number of times but just cannot get it working.

My code

#replace tumor_size with dummy variable 
Bcdata$Tumor_size=gsub('0-4',1,Bcdata$Tumor_size)
Bcdata$Tumor_size=gsub('5-9',2,Bcdata$Tumor_size)
Bcdata$Tumor_size=gsub('10-14',3,Bcdata$Tumor_size)
Bcdata$Tumor_size=gsub('15-19',4,Bcdata$Tumor_size)
Bcdata$Tumor_size=gsub('20-24',5,Bcdata$Tumor_size)
Bcdata$Tumor_size=gsub('25-29',6,Bcdata$Tumor_size)
Bcdata$Tumor_size=gsub('30-34',7,Bcdata$Tumor_size)
Bcdata$Tumor_size=gsub('35-39',8,Bcdata$Tumor_size)
Bcdata$Tumor_size=gsub('40-44',9,Bcdata$Tumor_size)
Bcdata$Tumor_size=gsub('45-49',10,Bcdata$Tumor_size)
Bcdata$Tumor_size=gsub('50-54',11,Bcdata$Tumor_size)
Bcdata$Tumor_size=gsub('55-59',12,Bcdata$Tumor_size)

Table before and after I run my code

> table(Bcdata$Tumor_size)

  0-4 10-14 15-19 20-24 25-29 30-34 35-39 40-44 45-49   5-9 50-54 
    8    28    30    50    54    60    19    22     3     4     8

> table(Bcdata$Tumor_size)

  1  10  11   2   3   4 414   5   6   7   8 
  8   3   8   4  28  30  22  50  54  60  19 
> 

And a sample of the data.

> head(Bcdata)
                 Class   Age Menopause Tumor_size Inv_nodes Node_caps Deg_malig Breast Irradiate
1 no-recurrence-events 30-39   premeno      30-34       0-2        no         3   left        no
2 no-recurrence-events 40-49   premeno      20-24       0-2        no         2  right        no
3 no-recurrence-events 40-49   premeno      20-24       0-2        no         2   left        no
4 no-recurrence-events 60-69      ge40      15-19       0-2        no         2  right        no
5 no-recurrence-events 40-49   premeno        0-4       0-2        no         2  right        no
6 no-recurrence-events 60-69      ge40      15-19       0-2        no         2   left        no
> tail(Bcdata)
                Class   Age Menopause Tumor_size Inv_nodes Node_caps Deg_malig Breast Irradiate
281 recurrence-events 50-59      ge40      40-44       6-8       yes         3   left       yes
282 recurrence-events 30-39   premeno      30-34       0-2        no         2   left        no
283 recurrence-events 30-39   premeno      20-24       0-2        no         3   left       yes
284 recurrence-events 60-69      ge40      20-24       0-2        no         1  right        no
285 recurrence-events 40-49      ge40      30-34       3-5        no         3   left        no
286 recurrence-events 50-59      ge40      30-34       3-5        no         3   left        no

I keep attempting to rewrite the code to fix it, even though it looks right, then reset the data back to the raw values and run the code again but the same thing keeps happening. Help!!

EDIT: as requested, partial and full dput

> dput(Bcdata$Tumor_size)
structure(c(6L, 4L, 4L, 3L, 1L, 3L, 5L, 4L, 11L, 4L, 1L, 5L, 
2L, 5L, 6L, 6L, 3L, 6L, 6L, 6L, 8L, 3L, 5L, 8L, 7L, 5L, 4L, 5L, 
8L, 6L, 8L, 3L, 2L, 2L, 2L, 6L, 1L, 3L, 2L, 6L, 4L, 5L, 10L, 
2L, 11L, 6L, 5L, 5L, 4L, 4L, 3L, 4L, 3L, 4L, 8L, 8L, 1L, 10L, 
6L, 3L, 4L, 2L, 1L, 7L, 5L, 2L, 5L, 4L, 7L, 11L, 2L, 5L, 4L, 
3L, 10L, 2L, 2L, 5L, 5L, 5L, 2L, 2L, 3L, 3L, 4L, 7L, 5L, 1L, 
4L, 8L, 1L, 4L, 5L, 4L, 2L, 6L, 6L, 3L, 6L, 5L, 4L, 6L, 5L, 4L, 
2L, 6L, 4L, 8L, 6L, 6L, 5L, 3L, 4L, 2L, 7L, 4L, 3L, 4L, 2L, 3L, 
4L, 3L, 8L, 6L, 2L, 2L, 6L, 5L, 5L, 7L, 7L, 8L, 6L, 8L, 6L, 4L, 
8L, 10L, 8L, 6L, 8L, 4L, 2L, 9L, 9L, 5L, 11L, 6L, 4L, 6L, 5L, 
6L, 7L, 3L, 3L, 8L, 5L, 6L, 6L, 7L, 5L, 6L, 2L, 5L, 5L, 4L, 4L, 
8L, 2L, 6L, 4L, 3L, 6L, 4L, 5L, 6L, 5L, 2L, 5L, 4L, 7L, 7L, 5L, 
6L, 6L, 4L, 5L, 3L, 2L, 4L, 3L, 5L, 6L, 2L, 11L, 7L, 2L, 2L, 
3L, 5L, 5L, 3L, 8L, 7L, 5L, 1L, 6L, 5L, 6L, 7L, 4L, 4L, 6L, 5L, 
8L, 4L, 4L, 3L, 6L, 3L, 5L, 6L, 5L, 4L, 5L, 4L, 6L, 6L, 8L, 9L, 
11L, 6L, 6L, 3L, 6L, 5L, 5L, 5L, 7L, 4L, 4L, 3L, 5L, 4L, 6L, 
6L, 3L, 6L, 7L, 4L, 5L, 11L, 8L, 11L, 6L, 6L, 6L, 4L, 6L, 6L, 
5L, 5L, 5L, 4L, 4L, 7L, 6L, 4L, 7L, 5L, 6L, 5L, 3L, 6L, 6L, 5L, 
5L, 2L, 7L, 8L, 8L, 6L, 4L, 4L, 6L, 6L), .Label = c("0-4", "10-14", 
"15-19", "20-24", "25-29", "30-34", "35-39", "40-44", "45-49", 
"5-9", "50-54"), class = "factor")
> dput(Bcdata)
structure(list(Class = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 
2L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("no-recurrence-events", 
"recurrence-events"), class = "factor"), Age = structure(c(2L, 
3L, 3L, 5L, 3L, 5L, 4L, 5L, 3L, 3L, 3L, 4L, 5L, 4L, 3L, 5L, 3L, 
4L, 5L, 4L, 4L, 5L, 2L, 4L, 4L, 3L, 4L, 5L, 3L, 5L, 4L, 4L, 4L, 
4L, 4L, 2L, 4L, 4L, 3L, 3L, 4L, 5L, 5L, 3L, 4L, 4L, 3L, 4L, 3L, 
3L, 4L, 2L, 4L, 6L, 6L, 6L, 4L, 4L, 5L, 5L, 3L, 3L, 4L, 1L, 3L, 
3L, 3L, 4L, 4L, 5L, 5L, 3L, 5L, 4L, 2L, 4L, 4L, 2L, 4L, 3L, 4L, 
5L, 5L, 4L, 3L, 4L, 5L, 6L, 4L, 3L, 2L, 4L, 4L, 5L, 4L, 3L, 5L, 
5L, 3L, 2L, 3L, 4L, 4L, 3L, 3L, 3L, 3L, 2L, 3L, 5L, 4L, 4L, 3L, 
3L, 3L, 4L, 2L, 3L, 2L, 5L, 5L, 4L, 4L, 4L, 5L, 6L, 2L, 2L, 4L, 
3L, 3L, 3L, 3L, 4L, 5L, 2L, 2L, 3L, 2L, 3L, 4L, 4L, 5L, 3L, 5L, 
3L, 5L, 4L, 2L, 4L, 4L, 5L, 4L, 5L, 2L, 5L, 4L, 4L, 4L, 3L, 3L, 
3L, 5L, 5L, 5L, 3L, 3L, 3L, 4L, 3L, 2L, 2L, 5L, 4L, 4L, 3L, 3L, 
5L, 4L, 3L, 3L, 3L, 3L, 4L, 4L, 3L, 4L, 5L, 3L, 4L, 3L, 3L, 4L, 
2L, 4L, 4L, 4L, 3L, 4L, 4L, 5L, 4L, 3L, 4L, 4L, 2L, 4L, 4L, 4L, 
3L, 3L, 4L, 3L, 4L, 5L, 3L, 4L, 3L, 5L, 2L, 3L, 2L, 5L, 5L, 2L, 
3L, 3L, 4L, 5L, 5L, 4L, 3L, 2L, 6L, 5L, 4L, 3L, 3L, 2L, 3L, 5L, 
3L, 4L, 4L, 3L, 2L, 2L, 4L, 5L, 2L, 3L, 3L, 2L, 5L, 3L, 3L, 3L, 
3L, 4L, 4L, 5L, 3L, 5L, 4L, 4L, 2L, 3L, 5L, 2L, 3L, 4L, 4L, 3L, 
5L, 5L, 3L, 2L, 5L, 4L, 4L, 4L, 2L, 2L, 5L, 3L, 4L), .Label = c("20-29", 
"30-39", "40-49", "50-59", "60-69", "70-79"), class = "factor"), 
    Menopause = structure(c(3L, 3L, 3L, 1L, 3L, 1L, 3L, 1L, 3L, 
    3L, 3L, 1L, 2L, 1L, 3L, 2L, 3L, 3L, 1L, 1L, 1L, 1L, 3L, 3L, 
    3L, 3L, 3L, 1L, 3L, 1L, 1L, 3L, 3L, 1L, 1L, 3L, 1L, 1L, 3L, 
    3L, 1L, 1L, 1L, 3L, 1L, 1L, 3L, 3L, 3L, 3L, 2L, 3L, 3L, 1L, 
    1L, 1L, 1L, 1L, 1L, 1L, 3L, 3L, 1L, 3L, 3L, 3L, 3L, 1L, 1L, 
    1L, 1L, 3L, 1L, 3L, 3L, 1L, 1L, 3L, 3L, 3L, 1L, 1L, 1L, 1L, 
    3L, 1L, 1L, 1L, 1L, 3L, 3L, 1L, 1L, 1L, 3L, 3L, 1L, 1L, 3L, 
    3L, 1L, 1L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 1L, 1L, 1L, 3L, 3L, 
    3L, 1L, 3L, 1L, 3L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 3L, 3L, 3L, 
    3L, 3L, 3L, 3L, 1L, 1L, 3L, 3L, 3L, 3L, 3L, 1L, 3L, 1L, 3L, 
    1L, 3L, 1L, 3L, 3L, 2L, 1L, 1L, 1L, 1L, 2L, 1L, 1L, 3L, 1L, 
    3L, 3L, 3L, 1L, 1L, 1L, 3L, 3L, 1L, 3L, 1L, 3L, 3L, 1L, 1L, 
    3L, 3L, 1L, 1L, 3L, 3L, 3L, 3L, 3L, 1L, 1L, 3L, 1L, 1L, 3L, 
    1L, 3L, 3L, 1L, 3L, 3L, 1L, 3L, 3L, 1L, 3L, 1L, 3L, 3L, 1L, 
    3L, 3L, 1L, 3L, 3L, 3L, 3L, 1L, 3L, 3L, 1L, 1L, 1L, 3L, 1L, 
    3L, 3L, 3L, 1L, 1L, 3L, 1L, 3L, 3L, 1L, 1L, 3L, 3L, 3L, 1L, 
    1L, 3L, 3L, 3L, 3L, 3L, 1L, 3L, 1L, 1L, 3L, 3L, 3L, 1L, 1L, 
    3L, 3L, 3L, 3L, 1L, 3L, 3L, 3L, 3L, 1L, 1L, 1L, 1L, 1L, 2L, 
    2L, 3L, 3L, 1L, 3L, 3L, 1L, 3L, 3L, 1L, 1L, 3L, 3L, 1L, 3L, 
    1L, 1L, 3L, 3L, 1L, 1L, 1L), .Label = c("ge40", "lt40", "premeno"
    ), class = "factor"), Tumor_size = structure(c(6L, 4L, 4L, 
    3L, 1L, 3L, 5L, 4L, 11L, 4L, 1L, 5L, 2L, 5L, 6L, 6L, 3L, 
    6L, 6L, 6L, 8L, 3L, 5L, 8L, 7L, 5L, 4L, 5L, 8L, 6L, 8L, 3L, 
    2L, 2L, 2L, 6L, 1L, 3L, 2L, 6L, 4L, 5L, 10L, 2L, 11L, 6L, 
    5L, 5L, 4L, 4L, 3L, 4L, 3L, 4L, 8L, 8L, 1L, 10L, 6L, 3L, 
    4L, 2L, 1L, 7L, 5L, 2L, 5L, 4L, 7L, 11L, 2L, 5L, 4L, 3L, 
    10L, 2L, 2L, 5L, 5L, 5L, 2L, 2L, 3L, 3L, 4L, 7L, 5L, 1L, 
    4L, 8L, 1L, 4L, 5L, 4L, 2L, 6L, 6L, 3L, 6L, 5L, 4L, 6L, 5L, 
    4L, 2L, 6L, 4L, 8L, 6L, 6L, 5L, 3L, 4L, 2L, 7L, 4L, 3L, 4L, 
    2L, 3L, 4L, 3L, 8L, 6L, 2L, 2L, 6L, 5L, 5L, 7L, 7L, 8L, 6L, 
    8L, 6L, 4L, 8L, 10L, 8L, 6L, 8L, 4L, 2L, 9L, 9L, 5L, 11L, 
    6L, 4L, 6L, 5L, 6L, 7L, 3L, 3L, 8L, 5L, 6L, 6L, 7L, 5L, 6L, 
    2L, 5L, 5L, 4L, 4L, 8L, 2L, 6L, 4L, 3L, 6L, 4L, 5L, 6L, 5L, 
    2L, 5L, 4L, 7L, 7L, 5L, 6L, 6L, 4L, 5L, 3L, 2L, 4L, 3L, 5L, 
    6L, 2L, 11L, 7L, 2L, 2L, 3L, 5L, 5L, 3L, 8L, 7L, 5L, 1L, 
    6L, 5L, 6L, 7L, 4L, 4L, 6L, 5L, 8L, 4L, 4L, 3L, 6L, 3L, 5L, 
    6L, 5L, 4L, 5L, 4L, 6L, 6L, 8L, 9L, 11L, 6L, 6L, 3L, 6L, 
    5L, 5L, 5L, 7L, 4L, 4L, 3L, 5L, 4L, 6L, 6L, 3L, 6L, 7L, 4L, 
    5L, 11L, 8L, 11L, 6L, 6L, 6L, 4L, 6L, 6L, 5L, 5L, 5L, 4L, 
    4L, 7L, 6L, 4L, 7L, 5L, 6L, 5L, 3L, 6L, 6L, 5L, 5L, 2L, 7L, 
    8L, 8L, 6L, 4L, 4L, 6L, 6L), .Label = c("0-4", "10-14", "15-19", 
    "20-24", "25-29", "30-34", "35-39", "40-44", "45-49", "5-9", 
    "50-54"), class = "factor"), Inv_nodes = structure(c(1L, 
    1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
    1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
    1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
    1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
    1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
    1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
    1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
    1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
    1L, 1L, 1L, 1L, 1L, 6L, 6L, 1L, 7L, 7L, 5L, 6L, 1L, 1L, 5L, 
    5L, 1L, 1L, 1L, 5L, 5L, 1L, 1L, 6L, 1L, 1L, 5L, 1L, 1L, 3L, 
    5L, 3L, 1L, 1L, 5L, 5L, 1L, 1L, 1L, 1L, 5L, 1L, 5L, 5L, 5L, 
    5L, 3L, 1L, 1L, 5L, 1L, 6L, 5L, 5L, 1L, 1L, 1L, 5L, 1L, 1L, 
    1L, 1L, 7L, 7L, 6L, 1L, 1L, 1L, 1L, 2L, 1L, 6L, 1L, 1L, 1L, 
    5L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
    1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 5L, 5L, 
    3L, 1L, 5L, 1L, 7L, 5L, 5L, 7L, 1L, 5L, 1L, 1L, 1L, 5L, 5L, 
    3L, 6L, 5L, 2L, 7L, 6L, 7L, 6L, 5L, 1L, 1L, 1L, 1L, 1L, 6L, 
    1L, 5L, 6L, 5L, 5L, 2L, 1L, 1L, 1L, 7L, 5L, 4L, 1L, 1L, 6L, 
    1L, 1L, 1L, 5L, 7L, 6L, 6L, 3L, 6L, 6L, 1L, 1L, 1L, 5L, 5L
    ), .Label = c("0-2", "12-14", "15-17", "24-26", "3-5", "6-8", 
    "9-11"), class = "factor"), Node_caps = structure(c(2L, 2L, 
    2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 
    2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 
    2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 
    2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 
    2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 
    2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 
    2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 
    2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 
    2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 2L, 2L, 2L, 2L, 2L, 
    2L, 2L, 2L, 3L, 3L, 2L, 2L, 3L, 1L, 2L, 3L, 2L, 2L, 3L, 3L, 
    2L, 2L, 2L, 2L, 3L, 2L, 2L, 2L, 2L, 3L, 2L, 1L, 1L, 2L, 2L, 
    3L, 2L, 2L, 3L, 2L, 3L, 3L, 3L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 
    2L, 1L, 1L, 2L, 2L, 2L, 2L, 3L, 2L, 2L, 3L, 2L, 3L, 2L, 2L, 
    2L, 3L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 
    2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 2L, 3L, 
    2L, 3L, 2L, 3L, 2L, 2L, 1L, 2L, 3L, 2L, 2L, 2L, 3L, 2L, 3L, 
    2L, 3L, 3L, 2L, 3L, 3L, 3L, 3L, 2L, 2L, 2L, 2L, 3L, 3L, 2L, 
    3L, 3L, 2L, 2L, 3L, 2L, 1L, 1L, 3L, 3L, 3L, 2L, 2L, 3L, 2L, 
    3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 2L, 2L, 2L, 2L, 2L), .Label = c("?", 
    "no", "yes"), class = "factor"), Deg_malig = c(3L, 2L, 2L, 
    2L, 2L, 2L, 2L, 1L, 2L, 2L, 3L, 2L, 1L, 3L, 3L, 1L, 2L, 3L, 
    3L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 3L, 2L, 2L, 3L, 2L, 3L, 
    1L, 1L, 2L, 2L, 1L, 2L, 1L, 1L, 2L, 1L, 2L, 1L, 1L, 2L, 1L, 
    1L, 1L, 2L, 2L, 1L, 3L, 1L, 1L, 1L, 2L, 1L, 1L, 2L, 1L, 1L, 
    2L, 1L, 1L, 1L, 3L, 3L, 2L, 1L, 2L, 2L, 2L, 2L, 1L, 2L, 1L, 
    2L, 2L, 2L, 1L, 2L, 2L, 1L, 3L, 2L, 1L, 3L, 1L, 2L, 3L, 2L, 
    2L, 1L, 2L, 2L, 2L, 1L, 2L, 3L, 3L, 2L, 2L, 2L, 1L, 2L, 2L, 
    3L, 1L, 1L, 1L, 2L, 1L, 2L, 2L, 1L, 3L, 1L, 1L, 1L, 2L, 3L, 
    1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 2L, 3L, 2L, 2L, 3L, 1L, 
    2L, 2L, 2L, 2L, 1L, 2L, 3L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 
    3L, 3L, 2L, 3L, 1L, 1L, 1L, 3L, 2L, 2L, 1L, 1L, 2L, 2L, 2L, 
    2L, 2L, 2L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 1L, 
    3L, 3L, 2L, 1L, 2L, 2L, 2L, 3L, 2L, 2L, 2L, 2L, 2L, 1L, 2L, 
    2L, 1L, 3L, 2L, 1L, 2L, 2L, 2L, 3L, 2L, 3L, 1L, 2L, 2L, 3L, 
    1L, 2L, 2L, 2L, 2L, 3L, 1L, 3L, 1L, 3L, 3L, 3L, 3L, 3L, 3L, 
    3L, 1L, 2L, 2L, 3L, 1L, 3L, 3L, 2L, 2L, 3L, 2L, 2L, 3L, 3L, 
    3L, 3L, 2L, 3L, 3L, 3L, 2L, 3L, 2L, 1L, 3L, 3L, 3L, 1L, 2L, 
    2L, 3L, 2L, 3L, 3L, 1L, 1L, 3L, 2L, 3L, 3L, 2L, 3L, 3L, 3L, 
    2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 2L, 3L, 1L, 3L, 3L), Breast = structure(c(1L, 
    2L, 1L, 2L, 2L, 1L, 1L, 1L, 1L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 
    1L, 1L, 1L, 2L, 1L, 1L, 2L, 1L, 2L, 1L, 1L, 2L, 2L, 1L, 2L, 
    2L, 1L, 2L, 1L, 1L, 1L, 2L, 1L, 1L, 2L, 1L, 1L, 1L, 2L, 1L, 
    2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 1L, 2L, 1L, 
    2L, 1L, 2L, 1L, 2L, 2L, 1L, 1L, 1L, 1L, 2L, 1L, 2L, 1L, 1L, 
    1L, 1L, 1L, 2L, 2L, 1L, 2L, 2L, 1L, 1L, 2L, 1L, 2L, 2L, 2L, 
    1L, 2L, 2L, 1L, 2L, 1L, 2L, 1L, 1L, 1L, 2L, 2L, 1L, 2L, 2L, 
    1L, 2L, 2L, 2L, 1L, 2L, 2L, 2L, 2L, 2L, 1L, 1L, 2L, 1L, 1L, 
    2L, 1L, 2L, 2L, 1L, 2L, 2L, 1L, 2L, 2L, 2L, 1L, 1L, 1L, 2L, 
    2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 1L, 1L, 2L, 2L, 
    1L, 1L, 2L, 2L, 2L, 2L, 1L, 2L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 
    2L, 2L, 2L, 1L, 2L, 1L, 2L, 2L, 1L, 2L, 1L, 1L, 2L, 2L, 2L, 
    2L, 2L, 1L, 1L, 2L, 1L, 1L, 2L, 2L, 2L, 1L, 1L, 1L, 2L, 1L, 
    2L, 1L, 1L, 1L, 2L, 1L, 1L, 1L, 1L, 2L, 1L, 1L, 1L, 2L, 1L, 
    2L, 2L, 2L, 2L, 2L, 1L, 1L, 2L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 
    1L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 2L, 1L, 2L, 2L, 1L, 2L, 1L, 
    1L, 1L, 2L, 1L, 2L, 1L, 1L, 1L, 1L, 1L, 2L, 1L, 2L, 2L, 2L, 
    1L, 1L, 1L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
    2L, 2L, 2L, 1L, 2L, 1L, 1L, 2L, 1L, 1L, 1L, 1L, 2L, 1L, 1L
    ), .Label = c("left", "right"), class = "factor"), Breast_quad = structure(c(3L, 
    6L, 3L, 4L, 5L, 3L, 3L, 3L, 3L, 4L, 2L, 3L, 6L, 6L, 4L, 3L, 
    3L, 3L, 3L, 6L, 3L, 3L, 3L, 4L, 4L, 4L, 3L, 4L, 3L, 3L, 4L, 
    3L, 3L, 4L, 4L, 4L, 2L, 2L, 3L, 3L, 3L, 3L, 2L, 4L, 6L, 4L, 
    3L, 4L, 6L, 3L, 3L, 5L, 3L, 4L, 4L, 6L, 2L, 6L, 4L, 4L, 2L, 
    5L, 3L, 6L, 5L, 4L, 5L, 4L, 3L, 3L, 3L, 4L, 4L, 5L, 5L, 3L, 
    3L, 2L, 3L, 2L, 3L, 4L, 3L, 3L, 5L, 4L, 3L, 5L, 4L, 4L, 2L, 
    4L, 4L, 4L, 3L, 5L, 4L, 4L, 6L, 3L, 3L, 3L, 5L, 5L, 3L, 4L, 
    4L, 6L, 6L, 4L, 3L, 2L, 4L, 4L, 6L, 4L, 3L, 4L, 3L, 5L, 3L, 
    6L, 4L, 3L, 3L, 2L, 6L, 4L, 4L, 4L, 6L, 4L, 4L, 6L, 3L, 2L, 
    6L, 3L, 3L, 5L, 3L, 3L, 4L, 3L, 2L, 5L, 4L, 3L, 2L, 4L, 4L, 
    3L, 3L, 4L, 4L, 4L, 4L, 2L, 2L, 3L, 4L, 3L, 4L, 4L, 3L, 4L, 
    3L, 4L, 4L, 4L, 4L, 3L, 6L, 4L, 3L, 6L, 3L, 3L, 4L, 3L, 4L, 
    3L, 3L, 4L, 3L, 3L, 5L, 4L, 4L, 4L, 5L, 4L, 3L, 5L, 4L, 4L, 
    4L, 3L, 2L, 3L, 3L, 3L, 3L, 3L, 6L, 2L, 1L, 6L, 6L, 4L, 3L, 
    2L, 6L, 4L, 3L, 4L, 4L, 4L, 2L, 3L, 6L, 4L, 5L, 3L, 3L, 3L, 
    3L, 4L, 3L, 6L, 4L, 4L, 4L, 3L, 4L, 3L, 3L, 3L, 3L, 6L, 3L, 
    3L, 3L, 6L, 4L, 4L, 3L, 5L, 3L, 3L, 4L, 3L, 4L, 4L, 6L, 4L, 
    3L, 3L, 5L, 4L, 6L, 5L, 4L, 4L, 3L, 3L, 6L, 3L, 3L, 3L, 5L, 
    3L, 4L, 6L, 2L, 4L, 5L, 4L, 6L, 3L, 3L, 4L, 4L, 4L, 3L, 3L
    ), .Label = c("?", "central", "left_low", "left_up", "right_low", 
    "right_up"), class = "factor"), Irradiate = structure(c(1L, 
    1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
    1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
    1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
    1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
    1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
    1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
    1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
    1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
    1L, 1L, 1L, 1L, 1L, 1L, 2L, 1L, 2L, 2L, 2L, 1L, 1L, 2L, 1L, 
    2L, 2L, 2L, 1L, 1L, 1L, 1L, 2L, 1L, 2L, 2L, 2L, 1L, 1L, 1L, 
    1L, 1L, 2L, 1L, 2L, 1L, 1L, 1L, 1L, 2L, 1L, 2L, 2L, 2L, 1L, 
    1L, 2L, 1L, 2L, 2L, 1L, 1L, 1L, 2L, 2L, 1L, 1L, 2L, 2L, 2L, 
    2L, 2L, 2L, 2L, 2L, 1L, 2L, 1L, 1L, 2L, 2L, 1L, 1L, 2L, 1L, 
    1L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
    1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 1L, 2L, 2L, 
    1L, 2L, 1L, 2L, 1L, 1L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 2L, 2L, 
    1L, 2L, 1L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 2L, 1L, 1L, 1L, 1L, 
    2L, 2L, 2L, 1L, 1L, 2L, 1L, 1L, 1L, 1L, 1L, 2L, 1L, 2L, 1L, 
    2L, 1L, 2L, 2L, 1L, 2L, 2L, 1L, 2L, 2L, 1L, 2L, 1L, 1L, 1L
    ), .Label = c("no", "yes"), class = "factor")), class = "data.frame", row.names = c(NA, 
-286L))

Upvotes: 0

Views: 121

Answers (3)

Ben Bolker
Ben Bolker

Reputation: 226087

Unless I'm missing something, you're working way harder than you have to.

In your data, Tumor_size is already a factor, with the levels in the correct order. Therefore, using as.numeric() will convert the strings to their corresponding numeric codes.

table(as.numeric(Bcdata$Tumor_size))

 1  2  3  4  5  6  7  8  9 10 11 
 8 28 30 50 54 60 19 22  3  4  8 

Upvotes: 2

s_pike
s_pike

Reputation: 2113

If you want a really quick solution, you could just change the pattern to match exactly:

Bcdata$Tumor_size=gsub('^0-4$',1,Bcdata$Tumor_size)

reference: Match exact string

Upvotes: 2

George Savva
George Savva

Reputation: 5306

'40-44' is being changed to '414' by the first gsub function, because it matches the middle part of the string:

Bcdata$Tumor_size=gsub('0-4',1,Bcdata$Tumor_size)

You should use a proper recoding function, or encode into a factor then use as.numeric to turn it into integer dummy values.

Upvotes: 4

Related Questions