Srm Murty
Srm Murty

Reputation: 135

how to get previous matching value

how to get value of "a" if value of b matches with most recent previous value e.g - row$3 of b matches with previous row$1 ,row$6 matches with row$4

df <- data.frame(year = c(2013,2013,2014,2014,2014,2015,2015,2015,2016,2016,2016),
           a = c(10,11,NA,13,22,NA,19,NA,10,15,NA),
           b = c(30.133,29,30.1223,33,17,33,11,17,14,13.913,14))

  year  a   b   *NEW*
2013    10  30.133  NA
2013    11  29      NA
2014    NA  30.1223 10
2014    13  33      NA
2014    22  17      NA
2015    NA  33      13
2015    19  11      NA
2015    NA  17      22
2016    10  14      NA
2016    15  13.913  10
2016    NA  14      15

Thanks

Upvotes: 0

Views: 77

Answers (1)

Prradep
Prradep

Reputation: 5696

For OPs example case

One way could be is to use duplicated() function.

# Input dataframe    
df <- data.frame(year = c(2013,2013,2014,2014,2014,2015,2015,2015,2016,2016,2016),
                     a = c(10,11,NA,13,22,NA,19,NA,10,15,NA),
                     b = c(30,29,30,33,17,33,11,17,14,14,14))

# creating a new column with default values
df$NEW <- NA

# updating the value using the previous matching position
df$NEW[duplicated(df$b)] <- df$a[duplicated(df$b,fromLast = TRUE)]

# expected output
df
#    year  a  b NEW
# 1  2013 10 30  NA
# 2  2013 11 29  NA
# 3  2014 NA 30  10
# 4  2014 13 33  NA
# 5  2014 22 17  NA
# 6  2015 NA 33  13
# 7  2015 19 11  NA
# 8  2015 NA 17  22
# 9  2016 10 14  NA
# 10 2016 15 14  10
# 11 2016 NA 14  15

General purpose usage

The above solution fails when the duplicates are not in sequential order. As per @DavidArenburg's advice. I have changed the fourth element df$b[4] <- 14. The general solution would require the usage of another handy function order() and should work for different possible cases.

# Input dataframe    
df <- data.frame(year = c(2013,2013,2014,2014,2014,2015,2015,2015,2016,2016,2016),
                 a = c(10,11,NA,13,22,NA,19,NA,10,15,NA),
                 b = c(30,29,30,14,17,33,11,17,14,14,14))

# creating a new column with default values
df$NEW <- NA

# sort the matching column
df <- df[order(df$b),]

# updating the value using the previous matching position
df$NEW[duplicated(df$b)] <- df$a[duplicated(df$b,fromLast = TRUE)]

# To original order
df <- df[order(as.integer(rownames(df))),]

# expected output
df
#    year  a  b NEW
# 1  2013 10 30  NA
# 2  2013 11 29  NA
# 3  2014 NA 30  10
# 4  2014 13 14  NA
# 5  2014 22 17  NA
# 6  2015 NA 33  NA
# 7  2015 19 11  NA
# 8  2015 NA 17  22
# 9  2016 10 14  13
# 10 2016 15 14  10
# 11 2016 NA 14  15

Here, the solution is based on the base package' functions. I am sure there should other ways of doing this using other packages.

Upvotes: 3

Related Questions