Kryo
Kryo

Reputation: 933

how to insert a new column in a dataset with values if it satisfies a statement

I have a large dataset and want to insert a new column in the dataset with binary values (0 & 1), if it satisfies the following criteria.

if columns with df1$seg.mean >= 0.5 is equal to df1$id == gain and df1$seg.mean <= -0.5 is equal to df1$id == loss, insert 1 in df1$Occurance. for those rows which does not satisfy this criteria assign df1$Occurance == 0

df1 <-
    Chr start       end     num.mark    seg.mean    id
    1   68580000    68640000    8430    0.7       gain
    1   115900000   116260000   8430    0.0039    loss
    1   173500000   173680000   5      -1.7738    loss
    1   173500000   173680000   12       0.011    loss
    1   173840000   174010000   6      -1.6121    loss

desired output

Chr     start       end     num.mark    seg.mean  id    Occurance
    1   68580000    68640000    8430    0.7       gain      1
    1   115900000   116260000   8430    0.0039    loss      0
    1   173500000   173680000   5      -1.7738    loss      1
    1   173500000   173680000   12       0.011    loss      0
    1   173840000   174010000   6      -1.6121    loss      1

Upvotes: 2

Views: 117

Answers (3)

Rentrop
Rentrop

Reputation: 21497

Try using ifelse

df1$Occurance <- ifelse((df1$seg.mean >= 0.5 & df1$id == "gain") | 
                      (df1$seg.mean <= -0.5 & df1$id == "loss"), 1, 0)

Edit: Avoiding ifelse and using within for not having to write df1 all the time you can use

transform(df1, Occurance = as.numeric((seg.mean >= 0.5 & id == "gain") |
                                        (seg.mean <= -0.5 & id == "loss")))

Comment: If you also Accept TRUE/FALSE insted of 1/0 you can skip the as.numeric

Edit #2: If you want to have multiple outcomes like -1,0,1 you can do the following

df1$Occurance = 0
within(df1, {Occurance[seg.mean >= 0.5 & id == "gain"] <- 1;
             Occurance[seg.mean <= -0.5 & id == "loss"] <- -1})

which results in

  Chr     start       end num.mark seg.mean   id Occurance
1   1  68580000  68640000     8430   0.7000 gain         1
2   1 115900000 116260000     8430   0.0039 loss         0
3   1 173500000 173680000        5  -1.7738 loss        -1
4   1 173500000 173680000       12   0.0110 loss         0
5   1 173840000 174010000        6  -1.6121 loss        -1

Upvotes: 4

User7598
User7598

Reputation: 1678

You can also do:

   df1$Occurrence[with(df1,(seg.mean>=.5 & id == "gain") | (seg.mean<=-.5 & id=="loss"))]<-1
   df1$Occurrence[is.na(df1$Occurrence)]<-0

Upvotes: -1

dimitris_ps
dimitris_ps

Reputation: 5951

Try this:

df1$Occurance <- (df1$seg.mean >= 0.5 & df1$id == "gain") | 
                  (df1$seg.mean <= -0.5 & df1$id == "loss"))*1

# TRUE*1 = 1
# FALSE*1 = 0

Upvotes: 2

Related Questions