JaviLarry
JaviLarry

Reputation: 23

Compare values in the same column and add the result in a second column in R

I am comparing values in the same column for different rows and depending on the value I want to give a result. Below the table. I have the IDs and I want to get the values of the column "Result". The column "What I have" is what I get from my function. Conditions for the results: If the ID (n) is different from the ID of the row before (n-1), then Result (n) is A. If the ID (n) is the same as in the ID of the row before and different from the next row (n+1), then Result (n) is C. Other cases is B.

l <- ifelse ((df$ID[1:(nrow(df)-1)] != df$ID[2:(nrow(df)+0)]),print("A"),
ifelse(((df$ID[1:(nrow(df)-1)] == df$ID[2:(nrow(df)+0)]) & (df$ID[2:(nrow(df)+0)] != df$ID[3:(nrow(df)+1)])), print ("C"),print ("B")))

    df
    ID Result What I have
1   12   A       A
2   13   A       A
3   14   A       A
4   15   A       B
5   15   B       B
6   15   B       B
7   15   B       B
8   15   B       C
9   15   C       A
10  16   A      NA
11  17   A      NA

Thanks a lot in advance

Upvotes: 1

Views: 893

Answers (2)

akrun
akrun

Reputation: 887158

We could also use rleid from data.table. We group by the run-length type id of 'ID' variable. Use a logical condition to create the 'Result' variable, i.e. if the number of elements is greater than 1 (.N >1), we concatenate 'A' with replicate of 'B' and 'C' or else to return 'A'.

library(data.table)#v1.9.6+
setDT(df)[, Result:=if(.N>1) c('A', rep('B', .N-2), 'C') else 'A' ,
                   by = rleid(ID)]
df 
#    ID Result
# 1: 12      A
# 2: 13      A
# 3: 14      A
# 4: 15      A
# 5: 15      B
# 6: 15      B
# 7: 15      B
# 8: 15      B
# 9: 15      C
#10: 16      A
#11: 17      A

data

df <- data.frame(ID= c(12:14, rep(15, 6), 16:17))

Upvotes: 1

LyzandeR
LyzandeR

Reputation: 37879

Using lead from dplyr and ifelse you could do it this way:

library(dplyr)
df$Result <- ifelse(df$ID != lag(df$ID) | is.na(lag(df$ID)), 'A',
                    ifelse(df$ID == lag(df$ID) & df$ID != lead(df$ID), 'C', 'B' ))

Output:

> df
   ID Result
1  12      A
2  13      A
3  14      A
4  15      A
5  15      B
6  15      B
7  15      B
8  15      B
9  15      C
10 16      A
11 17      A

A few words for clarification: lag shifts the column by 1 row whereas lead does exactly the opposite thing i.e. takes the column back by 1 row. Check lag(df$ID) and lead(df$ID) for visualizing it.

Upvotes: 1

Related Questions