Mahesh
Mahesh

Reputation: 3997

Error in creating new column in dataframe in R

I have a df as shown below:

     id type start end features
1     5 word     1   2       NN
2     6 word     3   3        .
3     7 word     5  12       NN
4     8 word    14  19      VBZ
5     9 word    21  30       NN
6    10 word    32  32      WDT
7    11 word    34  37      VBP
8    12 word    39  41       IN
9    13 word    43  44       IN
10   14 word    46  46       DT

I want to create a new column "sum" with sum of each value in 'start' and 'end'.

I have created the following function:

    mySum <- function(row) {
      row["start"]+row["end"]
    }
    df$sum <- apply(df,1, mySum );

But when I run this I get following error:

Error in row["start"] + row["end"] : 
  non-numeric argument to binary operator

But if I keep only row["start"] or row["end"] in the function, it gets created.

I have also tried to force each value in the columns to be numeric.

df$start = as.integer(as.vector(df$start));
df$end = as.integer(as.vector(df$end)); 

But still I get the same error, only when I add the values.

The structure of my dataframe is as follows: After I ran dput(droplevels(head(df,10)))

structure(list(id = 5:14, type = c("word", "word", "word", "word", 
"word", "word", "word", "word", "word", "word"), start = c(1L, 
3L, 5L, 14L, 21L, 32L, 34L, 39L, 43L, 46L), end = c(2L, 3L, 12L, 
19L, 30L, 32L, 37L, 41L, 44L, 46L), features = list(structure(list(
    POS = "NN"), .Names = "POS"), structure(list(POS = "."), .Names = "POS"), 
    structure(list(POS = "NN"), .Names = "POS"), structure(list(
        POS = "VBZ"), .Names = "POS"), structure(list(POS = "NN"), .Names = "POS"), 
    structure(list(POS = "WDT"), .Names = "POS"), structure(list(
        POS = "VBP"), .Names = "POS"), structure(list(POS = "IN"), .Names = "POS"), 
    structure(list(POS = "IN"), .Names = "POS"), structure(list(
        POS = "DT"), .Names = "POS"))), .Names = c("id", "type", 
"start", "end", "features"), row.names = c(NA, 10L), class = "data.frame")

Upvotes: 0

Views: 489

Answers (1)

akrun
akrun

Reputation: 886938

Just do

df1$Sum <- df1[,'start']+ df1[,'end']
df1$Sum
#[1]  3  6 17 33 51 64 71 80 87 92

Or

rowSums(df1[c('start', 'end')], na.rm=TRUE)
#1  2  3  4  5  6  7  8  9 10 
#3  6 17 33 51 64 71 80 87 92 

The error suggests that you have non-numeric columns. Check the str(df1). If the class is factor or character, then change it to numeric and apply the code as above. For example, if the columns are factor, we convert to numeric by

 df1[c('start', 'end')] <- lapply(df1[c('start', 'end')],
               function(x) as.numeric(as.character(x)))

In case of character columns, just use as.numeric.

Upvotes: 1

Related Questions