Adam Amin
Adam Amin

Reputation: 1456

Replace a value in a column based on column number

In the following script:

  dataset <- read.csv("/home/adam/Desktop/Temp/lrtest.csv")
  for(i in 3:ncol(dataset)){
    uq <- unique(dataset[,i])
    j <- i * 100
    for(x in uq){
      dataset[,i][dataset[,i] == x] <- j #dataset$nm[dataset$nm == x] <- j
      j <- j + 1
    }
  }

I would like to go though each column and replace each of its String values with numbers. The problem is replacing the values (line 6) results in NA, look at the output.

How can I solve it?

The data:

Class       Branch  LA_type Method_type     Method_call     Branch_type     Branch_condition    Tested_parameter
Goal        12      Smooth  public static   never called    IFNE            TRUE                 String
TreeApp     20      Rugged  constructor     none            IF_ICMPGE  FALSE                     int
Password    4       Smooth  private         never called    IFEQ    FALSE                        int
XMLParser   9       Rugged  constructor     none            IFNONNULL   TRUE                     String
MapClass    33      Smooth  public          never called    IFGT    FALSE                        double

The output:

      Class Branch LA_type Method_type Method_call Branch_type Branch_condition Tested_parameter
1      Goal     12    <NA>        <NA>        <NA>        <NA>              700             <NA>
2   TreeApp     20    <NA>        <NA>        <NA>        <NA>              701             <NA>
3  Password      4    <NA>        <NA>        <NA>        <NA>              701             <NA>
4 XMLParser      9    <NA>        <NA>        <NA>        <NA>              700             <NA>
5  MapClass     33    <NA>        <NA>        <NA>        <NA>              701             <NA>

Upvotes: 2

Views: 50

Answers (1)

Ronak Shah
Ronak Shah

Reputation: 388982

We can use lapply to iterate over column 3 to end of the dataframe, convert the data to factor (which it already is probably) with unique levels and add increasing sequence of 100.

df[3:ncol(df)] <- lapply(3:ncol(df), function(x) 
       x * 100 + as.integer(factor(df[[x]], levels = unique(df[[x]]))) - 1)

df
#      Class Branch LA_type Method_type Method_call Branch_type Branch_condition
#1      Goal     12     300         400         500         600              700
#2   TreeApp     20     301         401         501         601              701
#3  Password      4     300         402         500         602              701
#4 XMLParser      9     301         401         501         603              700
#5  MapClass     33     300         403         500         604              701

#  Tested_parameter
#1              800
#2              801
#3              801
#4              800
#5              802

data

df <- structure(list(Class = structure(c(1L, 4L, 3L, 5L, 2L), .Label = c("Goal", 
"MapClass", "Password", "TreeApp", "XMLParser"), class = "factor"), 
Branch = c(12L, 20L, 4L, 9L, 33L), LA_type = structure(c(2L, 
1L, 2L, 1L, 2L), .Label = c("Rugged", "Smooth"), class = "factor"), 
Method_type = structure(c(4L, 1L, 2L, 1L, 3L), .Label = c("constructor", 
"private", "public", "public_static"), class = "factor"), 
Method_call = structure(c(1L, 2L, 1L, 2L, 1L), .Label = c("never_called", 
"none"), class = "factor"), Branch_type = structure(c(4L, 
1L, 2L, 5L, 3L), .Label = c("IF_ICMPGE", "IFEQ", "IFGT", 
"IFNE", "IFNONNULL"), class = "factor"), Branch_condition = c(TRUE, 
FALSE, FALSE, TRUE, FALSE), Tested_parameter = structure(c(3L, 
2L, 2L, 3L, 1L), .Label = c("double", "int", "String"), class = "factor")), 
class = "data.frame", row.names = c(NA, -5L))

Upvotes: 2

Related Questions