Reputation: 31
I am trying to replace some values for a variable within my data set but I keep getting an unexpected value of 414 assigned instead of 9. I've been over the code a number of times but just cannot get it working.
My code
#replace tumor_size with dummy variable
Bcdata$Tumor_size=gsub('0-4',1,Bcdata$Tumor_size)
Bcdata$Tumor_size=gsub('5-9',2,Bcdata$Tumor_size)
Bcdata$Tumor_size=gsub('10-14',3,Bcdata$Tumor_size)
Bcdata$Tumor_size=gsub('15-19',4,Bcdata$Tumor_size)
Bcdata$Tumor_size=gsub('20-24',5,Bcdata$Tumor_size)
Bcdata$Tumor_size=gsub('25-29',6,Bcdata$Tumor_size)
Bcdata$Tumor_size=gsub('30-34',7,Bcdata$Tumor_size)
Bcdata$Tumor_size=gsub('35-39',8,Bcdata$Tumor_size)
Bcdata$Tumor_size=gsub('40-44',9,Bcdata$Tumor_size)
Bcdata$Tumor_size=gsub('45-49',10,Bcdata$Tumor_size)
Bcdata$Tumor_size=gsub('50-54',11,Bcdata$Tumor_size)
Bcdata$Tumor_size=gsub('55-59',12,Bcdata$Tumor_size)
Table before and after I run my code
> table(Bcdata$Tumor_size)
0-4 10-14 15-19 20-24 25-29 30-34 35-39 40-44 45-49 5-9 50-54
8 28 30 50 54 60 19 22 3 4 8
> table(Bcdata$Tumor_size)
1 10 11 2 3 4 414 5 6 7 8
8 3 8 4 28 30 22 50 54 60 19
>
And a sample of the data.
> head(Bcdata)
Class Age Menopause Tumor_size Inv_nodes Node_caps Deg_malig Breast Irradiate
1 no-recurrence-events 30-39 premeno 30-34 0-2 no 3 left no
2 no-recurrence-events 40-49 premeno 20-24 0-2 no 2 right no
3 no-recurrence-events 40-49 premeno 20-24 0-2 no 2 left no
4 no-recurrence-events 60-69 ge40 15-19 0-2 no 2 right no
5 no-recurrence-events 40-49 premeno 0-4 0-2 no 2 right no
6 no-recurrence-events 60-69 ge40 15-19 0-2 no 2 left no
> tail(Bcdata)
Class Age Menopause Tumor_size Inv_nodes Node_caps Deg_malig Breast Irradiate
281 recurrence-events 50-59 ge40 40-44 6-8 yes 3 left yes
282 recurrence-events 30-39 premeno 30-34 0-2 no 2 left no
283 recurrence-events 30-39 premeno 20-24 0-2 no 3 left yes
284 recurrence-events 60-69 ge40 20-24 0-2 no 1 right no
285 recurrence-events 40-49 ge40 30-34 3-5 no 3 left no
286 recurrence-events 50-59 ge40 30-34 3-5 no 3 left no
I keep attempting to rewrite the code to fix it, even though it looks right, then reset the data back to the raw values and run the code again but the same thing keeps happening. Help!!
EDIT: as requested, partial and full dput
> dput(Bcdata$Tumor_size)
structure(c(6L, 4L, 4L, 3L, 1L, 3L, 5L, 4L, 11L, 4L, 1L, 5L,
2L, 5L, 6L, 6L, 3L, 6L, 6L, 6L, 8L, 3L, 5L, 8L, 7L, 5L, 4L, 5L,
8L, 6L, 8L, 3L, 2L, 2L, 2L, 6L, 1L, 3L, 2L, 6L, 4L, 5L, 10L,
2L, 11L, 6L, 5L, 5L, 4L, 4L, 3L, 4L, 3L, 4L, 8L, 8L, 1L, 10L,
6L, 3L, 4L, 2L, 1L, 7L, 5L, 2L, 5L, 4L, 7L, 11L, 2L, 5L, 4L,
3L, 10L, 2L, 2L, 5L, 5L, 5L, 2L, 2L, 3L, 3L, 4L, 7L, 5L, 1L,
4L, 8L, 1L, 4L, 5L, 4L, 2L, 6L, 6L, 3L, 6L, 5L, 4L, 6L, 5L, 4L,
2L, 6L, 4L, 8L, 6L, 6L, 5L, 3L, 4L, 2L, 7L, 4L, 3L, 4L, 2L, 3L,
4L, 3L, 8L, 6L, 2L, 2L, 6L, 5L, 5L, 7L, 7L, 8L, 6L, 8L, 6L, 4L,
8L, 10L, 8L, 6L, 8L, 4L, 2L, 9L, 9L, 5L, 11L, 6L, 4L, 6L, 5L,
6L, 7L, 3L, 3L, 8L, 5L, 6L, 6L, 7L, 5L, 6L, 2L, 5L, 5L, 4L, 4L,
8L, 2L, 6L, 4L, 3L, 6L, 4L, 5L, 6L, 5L, 2L, 5L, 4L, 7L, 7L, 5L,
6L, 6L, 4L, 5L, 3L, 2L, 4L, 3L, 5L, 6L, 2L, 11L, 7L, 2L, 2L,
3L, 5L, 5L, 3L, 8L, 7L, 5L, 1L, 6L, 5L, 6L, 7L, 4L, 4L, 6L, 5L,
8L, 4L, 4L, 3L, 6L, 3L, 5L, 6L, 5L, 4L, 5L, 4L, 6L, 6L, 8L, 9L,
11L, 6L, 6L, 3L, 6L, 5L, 5L, 5L, 7L, 4L, 4L, 3L, 5L, 4L, 6L,
6L, 3L, 6L, 7L, 4L, 5L, 11L, 8L, 11L, 6L, 6L, 6L, 4L, 6L, 6L,
5L, 5L, 5L, 4L, 4L, 7L, 6L, 4L, 7L, 5L, 6L, 5L, 3L, 6L, 6L, 5L,
5L, 2L, 7L, 8L, 8L, 6L, 4L, 4L, 6L, 6L), .Label = c("0-4", "10-14",
"15-19", "20-24", "25-29", "30-34", "35-39", "40-44", "45-49",
"5-9", "50-54"), class = "factor")
> dput(Bcdata)
structure(list(Class = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("no-recurrence-events",
"recurrence-events"), class = "factor"), Age = structure(c(2L,
3L, 3L, 5L, 3L, 5L, 4L, 5L, 3L, 3L, 3L, 4L, 5L, 4L, 3L, 5L, 3L,
4L, 5L, 4L, 4L, 5L, 2L, 4L, 4L, 3L, 4L, 5L, 3L, 5L, 4L, 4L, 4L,
4L, 4L, 2L, 4L, 4L, 3L, 3L, 4L, 5L, 5L, 3L, 4L, 4L, 3L, 4L, 3L,
3L, 4L, 2L, 4L, 6L, 6L, 6L, 4L, 4L, 5L, 5L, 3L, 3L, 4L, 1L, 3L,
3L, 3L, 4L, 4L, 5L, 5L, 3L, 5L, 4L, 2L, 4L, 4L, 2L, 4L, 3L, 4L,
5L, 5L, 4L, 3L, 4L, 5L, 6L, 4L, 3L, 2L, 4L, 4L, 5L, 4L, 3L, 5L,
5L, 3L, 2L, 3L, 4L, 4L, 3L, 3L, 3L, 3L, 2L, 3L, 5L, 4L, 4L, 3L,
3L, 3L, 4L, 2L, 3L, 2L, 5L, 5L, 4L, 4L, 4L, 5L, 6L, 2L, 2L, 4L,
3L, 3L, 3L, 3L, 4L, 5L, 2L, 2L, 3L, 2L, 3L, 4L, 4L, 5L, 3L, 5L,
3L, 5L, 4L, 2L, 4L, 4L, 5L, 4L, 5L, 2L, 5L, 4L, 4L, 4L, 3L, 3L,
3L, 5L, 5L, 5L, 3L, 3L, 3L, 4L, 3L, 2L, 2L, 5L, 4L, 4L, 3L, 3L,
5L, 4L, 3L, 3L, 3L, 3L, 4L, 4L, 3L, 4L, 5L, 3L, 4L, 3L, 3L, 4L,
2L, 4L, 4L, 4L, 3L, 4L, 4L, 5L, 4L, 3L, 4L, 4L, 2L, 4L, 4L, 4L,
3L, 3L, 4L, 3L, 4L, 5L, 3L, 4L, 3L, 5L, 2L, 3L, 2L, 5L, 5L, 2L,
3L, 3L, 4L, 5L, 5L, 4L, 3L, 2L, 6L, 5L, 4L, 3L, 3L, 2L, 3L, 5L,
3L, 4L, 4L, 3L, 2L, 2L, 4L, 5L, 2L, 3L, 3L, 2L, 5L, 3L, 3L, 3L,
3L, 4L, 4L, 5L, 3L, 5L, 4L, 4L, 2L, 3L, 5L, 2L, 3L, 4L, 4L, 3L,
5L, 5L, 3L, 2L, 5L, 4L, 4L, 4L, 2L, 2L, 5L, 3L, 4L), .Label = c("20-29",
"30-39", "40-49", "50-59", "60-69", "70-79"), class = "factor"),
Menopause = structure(c(3L, 3L, 3L, 1L, 3L, 1L, 3L, 1L, 3L,
3L, 3L, 1L, 2L, 1L, 3L, 2L, 3L, 3L, 1L, 1L, 1L, 1L, 3L, 3L,
3L, 3L, 3L, 1L, 3L, 1L, 1L, 3L, 3L, 1L, 1L, 3L, 1L, 1L, 3L,
3L, 1L, 1L, 1L, 3L, 1L, 1L, 3L, 3L, 3L, 3L, 2L, 3L, 3L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 3L, 3L, 1L, 3L, 3L, 3L, 3L, 1L, 1L,
1L, 1L, 3L, 1L, 3L, 3L, 1L, 1L, 3L, 3L, 3L, 1L, 1L, 1L, 1L,
3L, 1L, 1L, 1L, 1L, 3L, 3L, 1L, 1L, 1L, 3L, 3L, 1L, 1L, 3L,
3L, 1L, 1L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 1L, 1L, 1L, 3L, 3L,
3L, 1L, 3L, 1L, 3L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 3L, 3L, 3L,
3L, 3L, 3L, 3L, 1L, 1L, 3L, 3L, 3L, 3L, 3L, 1L, 3L, 1L, 3L,
1L, 3L, 1L, 3L, 3L, 2L, 1L, 1L, 1L, 1L, 2L, 1L, 1L, 3L, 1L,
3L, 3L, 3L, 1L, 1L, 1L, 3L, 3L, 1L, 3L, 1L, 3L, 3L, 1L, 1L,
3L, 3L, 1L, 1L, 3L, 3L, 3L, 3L, 3L, 1L, 1L, 3L, 1L, 1L, 3L,
1L, 3L, 3L, 1L, 3L, 3L, 1L, 3L, 3L, 1L, 3L, 1L, 3L, 3L, 1L,
3L, 3L, 1L, 3L, 3L, 3L, 3L, 1L, 3L, 3L, 1L, 1L, 1L, 3L, 1L,
3L, 3L, 3L, 1L, 1L, 3L, 1L, 3L, 3L, 1L, 1L, 3L, 3L, 3L, 1L,
1L, 3L, 3L, 3L, 3L, 3L, 1L, 3L, 1L, 1L, 3L, 3L, 3L, 1L, 1L,
3L, 3L, 3L, 3L, 1L, 3L, 3L, 3L, 3L, 1L, 1L, 1L, 1L, 1L, 2L,
2L, 3L, 3L, 1L, 3L, 3L, 1L, 3L, 3L, 1L, 1L, 3L, 3L, 1L, 3L,
1L, 1L, 3L, 3L, 1L, 1L, 1L), .Label = c("ge40", "lt40", "premeno"
), class = "factor"), Tumor_size = structure(c(6L, 4L, 4L,
3L, 1L, 3L, 5L, 4L, 11L, 4L, 1L, 5L, 2L, 5L, 6L, 6L, 3L,
6L, 6L, 6L, 8L, 3L, 5L, 8L, 7L, 5L, 4L, 5L, 8L, 6L, 8L, 3L,
2L, 2L, 2L, 6L, 1L, 3L, 2L, 6L, 4L, 5L, 10L, 2L, 11L, 6L,
5L, 5L, 4L, 4L, 3L, 4L, 3L, 4L, 8L, 8L, 1L, 10L, 6L, 3L,
4L, 2L, 1L, 7L, 5L, 2L, 5L, 4L, 7L, 11L, 2L, 5L, 4L, 3L,
10L, 2L, 2L, 5L, 5L, 5L, 2L, 2L, 3L, 3L, 4L, 7L, 5L, 1L,
4L, 8L, 1L, 4L, 5L, 4L, 2L, 6L, 6L, 3L, 6L, 5L, 4L, 6L, 5L,
4L, 2L, 6L, 4L, 8L, 6L, 6L, 5L, 3L, 4L, 2L, 7L, 4L, 3L, 4L,
2L, 3L, 4L, 3L, 8L, 6L, 2L, 2L, 6L, 5L, 5L, 7L, 7L, 8L, 6L,
8L, 6L, 4L, 8L, 10L, 8L, 6L, 8L, 4L, 2L, 9L, 9L, 5L, 11L,
6L, 4L, 6L, 5L, 6L, 7L, 3L, 3L, 8L, 5L, 6L, 6L, 7L, 5L, 6L,
2L, 5L, 5L, 4L, 4L, 8L, 2L, 6L, 4L, 3L, 6L, 4L, 5L, 6L, 5L,
2L, 5L, 4L, 7L, 7L, 5L, 6L, 6L, 4L, 5L, 3L, 2L, 4L, 3L, 5L,
6L, 2L, 11L, 7L, 2L, 2L, 3L, 5L, 5L, 3L, 8L, 7L, 5L, 1L,
6L, 5L, 6L, 7L, 4L, 4L, 6L, 5L, 8L, 4L, 4L, 3L, 6L, 3L, 5L,
6L, 5L, 4L, 5L, 4L, 6L, 6L, 8L, 9L, 11L, 6L, 6L, 3L, 6L,
5L, 5L, 5L, 7L, 4L, 4L, 3L, 5L, 4L, 6L, 6L, 3L, 6L, 7L, 4L,
5L, 11L, 8L, 11L, 6L, 6L, 6L, 4L, 6L, 6L, 5L, 5L, 5L, 4L,
4L, 7L, 6L, 4L, 7L, 5L, 6L, 5L, 3L, 6L, 6L, 5L, 5L, 2L, 7L,
8L, 8L, 6L, 4L, 4L, 6L, 6L), .Label = c("0-4", "10-14", "15-19",
"20-24", "25-29", "30-34", "35-39", "40-44", "45-49", "5-9",
"50-54"), class = "factor"), Inv_nodes = structure(c(1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 6L, 6L, 1L, 7L, 7L, 5L, 6L, 1L, 1L, 5L,
5L, 1L, 1L, 1L, 5L, 5L, 1L, 1L, 6L, 1L, 1L, 5L, 1L, 1L, 3L,
5L, 3L, 1L, 1L, 5L, 5L, 1L, 1L, 1L, 1L, 5L, 1L, 5L, 5L, 5L,
5L, 3L, 1L, 1L, 5L, 1L, 6L, 5L, 5L, 1L, 1L, 1L, 5L, 1L, 1L,
1L, 1L, 7L, 7L, 6L, 1L, 1L, 1L, 1L, 2L, 1L, 6L, 1L, 1L, 1L,
5L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 5L, 5L,
3L, 1L, 5L, 1L, 7L, 5L, 5L, 7L, 1L, 5L, 1L, 1L, 1L, 5L, 5L,
3L, 6L, 5L, 2L, 7L, 6L, 7L, 6L, 5L, 1L, 1L, 1L, 1L, 1L, 6L,
1L, 5L, 6L, 5L, 5L, 2L, 1L, 1L, 1L, 7L, 5L, 4L, 1L, 1L, 6L,
1L, 1L, 1L, 5L, 7L, 6L, 6L, 3L, 6L, 6L, 1L, 1L, 1L, 5L, 5L
), .Label = c("0-2", "12-14", "15-17", "24-26", "3-5", "6-8",
"9-11"), class = "factor"), Node_caps = structure(c(2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 3L, 3L, 2L, 2L, 3L, 1L, 2L, 3L, 2L, 2L, 3L, 3L,
2L, 2L, 2L, 2L, 3L, 2L, 2L, 2L, 2L, 3L, 2L, 1L, 1L, 2L, 2L,
3L, 2L, 2L, 3L, 2L, 3L, 3L, 3L, 2L, 2L, 2L, 2L, 2L, 3L, 3L,
2L, 1L, 1L, 2L, 2L, 2L, 2L, 3L, 2L, 2L, 3L, 2L, 3L, 2L, 2L,
2L, 3L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 2L, 3L,
2L, 3L, 2L, 3L, 2L, 2L, 1L, 2L, 3L, 2L, 2L, 2L, 3L, 2L, 3L,
2L, 3L, 3L, 2L, 3L, 3L, 3L, 3L, 2L, 2L, 2L, 2L, 3L, 3L, 2L,
3L, 3L, 2L, 2L, 3L, 2L, 1L, 1L, 3L, 3L, 3L, 2L, 2L, 3L, 2L,
3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 2L, 2L, 2L, 2L, 2L), .Label = c("?",
"no", "yes"), class = "factor"), Deg_malig = c(3L, 2L, 2L,
2L, 2L, 2L, 2L, 1L, 2L, 2L, 3L, 2L, 1L, 3L, 3L, 1L, 2L, 3L,
3L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 3L, 2L, 2L, 3L, 2L, 3L,
1L, 1L, 2L, 2L, 1L, 2L, 1L, 1L, 2L, 1L, 2L, 1L, 1L, 2L, 1L,
1L, 1L, 2L, 2L, 1L, 3L, 1L, 1L, 1L, 2L, 1L, 1L, 2L, 1L, 1L,
2L, 1L, 1L, 1L, 3L, 3L, 2L, 1L, 2L, 2L, 2L, 2L, 1L, 2L, 1L,
2L, 2L, 2L, 1L, 2L, 2L, 1L, 3L, 2L, 1L, 3L, 1L, 2L, 3L, 2L,
2L, 1L, 2L, 2L, 2L, 1L, 2L, 3L, 3L, 2L, 2L, 2L, 1L, 2L, 2L,
3L, 1L, 1L, 1L, 2L, 1L, 2L, 2L, 1L, 3L, 1L, 1L, 1L, 2L, 3L,
1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 2L, 3L, 2L, 2L, 3L, 1L,
2L, 2L, 2L, 2L, 1L, 2L, 3L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L,
3L, 3L, 2L, 3L, 1L, 1L, 1L, 3L, 2L, 2L, 1L, 1L, 2L, 2L, 2L,
2L, 2L, 2L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 1L,
3L, 3L, 2L, 1L, 2L, 2L, 2L, 3L, 2L, 2L, 2L, 2L, 2L, 1L, 2L,
2L, 1L, 3L, 2L, 1L, 2L, 2L, 2L, 3L, 2L, 3L, 1L, 2L, 2L, 3L,
1L, 2L, 2L, 2L, 2L, 3L, 1L, 3L, 1L, 3L, 3L, 3L, 3L, 3L, 3L,
3L, 1L, 2L, 2L, 3L, 1L, 3L, 3L, 2L, 2L, 3L, 2L, 2L, 3L, 3L,
3L, 3L, 2L, 3L, 3L, 3L, 2L, 3L, 2L, 1L, 3L, 3L, 3L, 1L, 2L,
2L, 3L, 2L, 3L, 3L, 1L, 1L, 3L, 2L, 3L, 3L, 2L, 3L, 3L, 3L,
2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 2L, 3L, 1L, 3L, 3L), Breast = structure(c(1L,
2L, 1L, 2L, 2L, 1L, 1L, 1L, 1L, 2L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 2L, 1L, 1L, 2L, 1L, 2L, 1L, 1L, 2L, 2L, 1L, 2L,
2L, 1L, 2L, 1L, 1L, 1L, 2L, 1L, 1L, 2L, 1L, 1L, 1L, 2L, 1L,
2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 1L, 2L, 1L,
2L, 1L, 2L, 1L, 2L, 2L, 1L, 1L, 1L, 1L, 2L, 1L, 2L, 1L, 1L,
1L, 1L, 1L, 2L, 2L, 1L, 2L, 2L, 1L, 1L, 2L, 1L, 2L, 2L, 2L,
1L, 2L, 2L, 1L, 2L, 1L, 2L, 1L, 1L, 1L, 2L, 2L, 1L, 2L, 2L,
1L, 2L, 2L, 2L, 1L, 2L, 2L, 2L, 2L, 2L, 1L, 1L, 2L, 1L, 1L,
2L, 1L, 2L, 2L, 1L, 2L, 2L, 1L, 2L, 2L, 2L, 1L, 1L, 1L, 2L,
2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 1L, 1L, 2L, 2L,
1L, 1L, 2L, 2L, 2L, 2L, 1L, 2L, 1L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 1L, 2L, 1L, 2L, 2L, 1L, 2L, 1L, 1L, 2L, 2L, 2L,
2L, 2L, 1L, 1L, 2L, 1L, 1L, 2L, 2L, 2L, 1L, 1L, 1L, 2L, 1L,
2L, 1L, 1L, 1L, 2L, 1L, 1L, 1L, 1L, 2L, 1L, 1L, 1L, 2L, 1L,
2L, 2L, 2L, 2L, 2L, 1L, 1L, 2L, 2L, 1L, 2L, 1L, 2L, 1L, 2L,
1L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 2L, 1L, 2L, 2L, 1L, 2L, 1L,
1L, 1L, 2L, 1L, 2L, 1L, 1L, 1L, 1L, 1L, 2L, 1L, 2L, 2L, 2L,
1L, 1L, 1L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
2L, 2L, 2L, 1L, 2L, 1L, 1L, 2L, 1L, 1L, 1L, 1L, 2L, 1L, 1L
), .Label = c("left", "right"), class = "factor"), Breast_quad = structure(c(3L,
6L, 3L, 4L, 5L, 3L, 3L, 3L, 3L, 4L, 2L, 3L, 6L, 6L, 4L, 3L,
3L, 3L, 3L, 6L, 3L, 3L, 3L, 4L, 4L, 4L, 3L, 4L, 3L, 3L, 4L,
3L, 3L, 4L, 4L, 4L, 2L, 2L, 3L, 3L, 3L, 3L, 2L, 4L, 6L, 4L,
3L, 4L, 6L, 3L, 3L, 5L, 3L, 4L, 4L, 6L, 2L, 6L, 4L, 4L, 2L,
5L, 3L, 6L, 5L, 4L, 5L, 4L, 3L, 3L, 3L, 4L, 4L, 5L, 5L, 3L,
3L, 2L, 3L, 2L, 3L, 4L, 3L, 3L, 5L, 4L, 3L, 5L, 4L, 4L, 2L,
4L, 4L, 4L, 3L, 5L, 4L, 4L, 6L, 3L, 3L, 3L, 5L, 5L, 3L, 4L,
4L, 6L, 6L, 4L, 3L, 2L, 4L, 4L, 6L, 4L, 3L, 4L, 3L, 5L, 3L,
6L, 4L, 3L, 3L, 2L, 6L, 4L, 4L, 4L, 6L, 4L, 4L, 6L, 3L, 2L,
6L, 3L, 3L, 5L, 3L, 3L, 4L, 3L, 2L, 5L, 4L, 3L, 2L, 4L, 4L,
3L, 3L, 4L, 4L, 4L, 4L, 2L, 2L, 3L, 4L, 3L, 4L, 4L, 3L, 4L,
3L, 4L, 4L, 4L, 4L, 3L, 6L, 4L, 3L, 6L, 3L, 3L, 4L, 3L, 4L,
3L, 3L, 4L, 3L, 3L, 5L, 4L, 4L, 4L, 5L, 4L, 3L, 5L, 4L, 4L,
4L, 3L, 2L, 3L, 3L, 3L, 3L, 3L, 6L, 2L, 1L, 6L, 6L, 4L, 3L,
2L, 6L, 4L, 3L, 4L, 4L, 4L, 2L, 3L, 6L, 4L, 5L, 3L, 3L, 3L,
3L, 4L, 3L, 6L, 4L, 4L, 4L, 3L, 4L, 3L, 3L, 3L, 3L, 6L, 3L,
3L, 3L, 6L, 4L, 4L, 3L, 5L, 3L, 3L, 4L, 3L, 4L, 4L, 6L, 4L,
3L, 3L, 5L, 4L, 6L, 5L, 4L, 4L, 3L, 3L, 6L, 3L, 3L, 3L, 5L,
3L, 4L, 6L, 2L, 4L, 5L, 4L, 6L, 3L, 3L, 4L, 4L, 4L, 3L, 3L
), .Label = c("?", "central", "left_low", "left_up", "right_low",
"right_up"), class = "factor"), Irradiate = structure(c(1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 2L, 1L, 2L, 2L, 2L, 1L, 1L, 2L, 1L,
2L, 2L, 2L, 1L, 1L, 1L, 1L, 2L, 1L, 2L, 2L, 2L, 1L, 1L, 1L,
1L, 1L, 2L, 1L, 2L, 1L, 1L, 1L, 1L, 2L, 1L, 2L, 2L, 2L, 1L,
1L, 2L, 1L, 2L, 2L, 1L, 1L, 1L, 2L, 2L, 1L, 1L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 1L, 2L, 1L, 1L, 2L, 2L, 1L, 1L, 2L, 1L,
1L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 1L, 2L, 2L,
1L, 2L, 1L, 2L, 1L, 1L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 2L, 2L,
1L, 2L, 1L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 2L, 1L, 1L, 1L, 1L,
2L, 2L, 2L, 1L, 1L, 2L, 1L, 1L, 1L, 1L, 1L, 2L, 1L, 2L, 1L,
2L, 1L, 2L, 2L, 1L, 2L, 2L, 1L, 2L, 2L, 1L, 2L, 1L, 1L, 1L
), .Label = c("no", "yes"), class = "factor")), class = "data.frame", row.names = c(NA,
-286L))
Upvotes: 0
Views: 121
Reputation: 226087
Unless I'm missing something, you're working way harder than you have to.
In your data, Tumor_size
is already a factor, with the levels in the correct order. Therefore, using as.numeric()
will convert the strings to their corresponding numeric codes.
table(as.numeric(Bcdata$Tumor_size))
1 2 3 4 5 6 7 8 9 10 11
8 28 30 50 54 60 19 22 3 4 8
Upvotes: 2
Reputation: 2113
If you want a really quick solution, you could just change the pattern to match exactly:
Bcdata$Tumor_size=gsub('^0-4$',1,Bcdata$Tumor_size)
reference: Match exact string
Upvotes: 2
Reputation: 5306
'40-44' is being changed to '414' by the first gsub function, because it matches the middle part of the string:
Bcdata$Tumor_size=gsub('0-4',1,Bcdata$Tumor_size)
You should use a proper recoding function, or encode into a factor then use as.numeric
to turn it into integer dummy values.
Upvotes: 4