Reputation: 1456
In the following script:
dataset <- read.csv("/home/adam/Desktop/Temp/lrtest.csv")
for(i in 3:ncol(dataset)){
uq <- unique(dataset[,i])
j <- i * 100
for(x in uq){
dataset[,i][dataset[,i] == x] <- j #dataset$nm[dataset$nm == x] <- j
j <- j + 1
}
}
I would like to go though each column and replace each of its String values with numbers. The problem is replacing the values (line 6) results in NA, look at the output.
How can I solve it?
The data:
Class Branch LA_type Method_type Method_call Branch_type Branch_condition Tested_parameter
Goal 12 Smooth public static never called IFNE TRUE String
TreeApp 20 Rugged constructor none IF_ICMPGE FALSE int
Password 4 Smooth private never called IFEQ FALSE int
XMLParser 9 Rugged constructor none IFNONNULL TRUE String
MapClass 33 Smooth public never called IFGT FALSE double
The output:
Class Branch LA_type Method_type Method_call Branch_type Branch_condition Tested_parameter
1 Goal 12 <NA> <NA> <NA> <NA> 700 <NA>
2 TreeApp 20 <NA> <NA> <NA> <NA> 701 <NA>
3 Password 4 <NA> <NA> <NA> <NA> 701 <NA>
4 XMLParser 9 <NA> <NA> <NA> <NA> 700 <NA>
5 MapClass 33 <NA> <NA> <NA> <NA> 701 <NA>
Upvotes: 2
Views: 50
Reputation: 388982
We can use lapply
to iterate over column 3 to end of the dataframe, convert the data to factor
(which it already is probably) with unique levels and add increasing sequence of 100.
df[3:ncol(df)] <- lapply(3:ncol(df), function(x)
x * 100 + as.integer(factor(df[[x]], levels = unique(df[[x]]))) - 1)
df
# Class Branch LA_type Method_type Method_call Branch_type Branch_condition
#1 Goal 12 300 400 500 600 700
#2 TreeApp 20 301 401 501 601 701
#3 Password 4 300 402 500 602 701
#4 XMLParser 9 301 401 501 603 700
#5 MapClass 33 300 403 500 604 701
# Tested_parameter
#1 800
#2 801
#3 801
#4 800
#5 802
data
df <- structure(list(Class = structure(c(1L, 4L, 3L, 5L, 2L), .Label = c("Goal",
"MapClass", "Password", "TreeApp", "XMLParser"), class = "factor"),
Branch = c(12L, 20L, 4L, 9L, 33L), LA_type = structure(c(2L,
1L, 2L, 1L, 2L), .Label = c("Rugged", "Smooth"), class = "factor"),
Method_type = structure(c(4L, 1L, 2L, 1L, 3L), .Label = c("constructor",
"private", "public", "public_static"), class = "factor"),
Method_call = structure(c(1L, 2L, 1L, 2L, 1L), .Label = c("never_called",
"none"), class = "factor"), Branch_type = structure(c(4L,
1L, 2L, 5L, 3L), .Label = c("IF_ICMPGE", "IFEQ", "IFGT",
"IFNE", "IFNONNULL"), class = "factor"), Branch_condition = c(TRUE,
FALSE, FALSE, TRUE, FALSE), Tested_parameter = structure(c(3L,
2L, 2L, 3L, 1L), .Label = c("double", "int", "String"), class = "factor")),
class = "data.frame", row.names = c(NA, -5L))
Upvotes: 2