Vector JX
Vector JX

Reputation: 179

Lookup value from dataframe with condition

I have a two data frames as mentioned below:

DF_1>
    Sr.No.  Stage           Time            Result          Result_2
    1       updated_date    1516868822411   1516868822361   1516868822350
    2       id              1516868822411   ABC             -
    3       engine_date     1516868822411   1516868822000   -
    4       blocked         1516868822411   80000           0
    5       updated_date    1516868822398   1516868822350   1516866877815
    6       list            1516868822398   BCD             -
    7       sub_stat_1      1516868779095   AC-12           AC-14
    8       status_1        1516868642468   AC-25           AC-38

DF_2>

Sr. No.     ID        Type_1 Type_2
1           AC-12      X      Y
2           AC-14      XX     YY
3           AC-25      A      B
4           AC-38      CC     CD

Now i want to vlookup value from DF_2 with below mentioned condition:

  1. If Stage is sub_stat_1 then vlookup Result and Result_2 from Type_2 (From DF_2).
  2. If stage is status_1 then vlookup Result and Result_2 from Type_1 (From DF_2).
  3. If stage is status_1 or sub_stat_1 but Result or Result_2 is then do nothing and give "-" value in output dataframe.
  4. keep other values same as DF_1 Result and Result_2 to Desired output column Final_1 and Final_2, Respectively.
  5. Whenever there are epoch time in Time, Result and Result_2 column if possible convert it to normal time in Desired output column Time_2, Final_1 and Final_2, Respectively.

Desired Output Dataframe:

Sr. No.    Stage   Time            Result          Result_2      Time_2                     Final_1                     Final_2
1      updated_date 1516868822411  1516868822361   1516868822350 25/01/2018 08:27:02        25/01/2018 08:27:02         25/01/2018 08:27:02
2      id           1516868822411  ABC             -             25/01/2018 08:27:02        ABC                         -
3      engine_date  1516868822411  1516868822000   -             25/01/2018 08:27:02        25/01/2018 08:27:02         -
4      blocked      1516868822411  80000           0             25/01/2018 08:27:02        80000                       0
5      updated_date 1516868822398  1516868822350   1516866877815 25/01/2018 08:27:02        25/01/2018 08:27:02         25/01/2018 07:54:38
6      list         1516868822398  BCD             -             25/01/2018 08:27:02        BCD                         -
7      sub_stat_1   1516868779095  AC-12           AC-14         25/01/2018 08:26:19        Y (Output of AC-12)         YY (Output of AC-14)
8      status_1     1516868642468  AC-25           AC-38         25/01/2018 08:24:02        A (Output of AC-25)         CC (Output of AC-38)

Upvotes: 0

Views: 690

Answers (1)

Gregor Thomas
Gregor Thomas

Reputation: 145765

I'm assuming your data frame columns are as printed, and your string columns are character type, not factor. Convert them to character if they are not already. (See sample data at bottom.)

# first we will create empty FINAL columns
DF_1$Final_1 = NA
DF_1$Final_2 = NA

If Stage is sub_stat_1 then vlookup Result and Result_2 from Type_2 (From DF_2)

cond = DF_1[, 'Stage'] == "sub_stat_1"
from = "Type_2"
DF_1[cond, "Final_1"] = DF_2[match(DF_1[cond, "Result"], DF_2[, 'ID']), from]
DF_1[cond, "Final_2"] = DF_2[match(DF_1[cond, "Result_2"], DF_2[, 'ID']), from]

If stage is status_1 then vlookup Result and Result_2 from Type_1 (From DF_2).

cond = DF_1[, 'Stage'] == "status_1"
from = "Type_1"
DF_1[cond, "Final_1"] = DF_2[match(DF_1[cond, "Result"], DF_2[, 'ID']), from]
DF_1[cond, "Final_2"] = DF_2[match(DF_1[cond, "Result_2"], DF_2[, 'ID']), from]

If stage is status_1 or sub_stat_1 but Result or Result_2 is then do nothing and give "-" value in output dataframe.

I we initialized with missing values, NA. I'd encourage you to leave them as such, but if you really want you could do DF_1[is.na(DF_1)] = "-".

keep other values same as DF_1 Result and Result_2 to Desired output column Final_1 and Final_2, Respectively.

cond = ! DF_1[, 'Stage'] %in% c("status_1", "sub_stat_1")
DF_1[cond, "Final_1"] = DF_1[cond, "Result"]
DF_1[cond, "Final_2"] = DF_1[cond, "Result_2"]

Whenever there are epoch time in Time, Result and Result_2 column if possible convert it to normal time in Desired output column Time_2, Final_1 and Final_2, Respectively.

I'll leave this one to you - you can use as.POSIXct() on your epoch times if you supply the origin, but your integers look too big to me. You may want to format them before inserting them into the Final columns so you can control what they look like as they get converted to characters. If you need more help with this, ask a separate question.

DF_1
#   Sr.No.        Stage         Time        Result      Result_2       Final_1       Final_2
# 1      1 updated_date 1.516869e+12 1516868822361 1516868822350 1516868822361 1516868822350
# 2      2           id 1.516869e+12           ABC             -           ABC             -
# 3      3  engine_date 1.516869e+12 1516868822000             - 1516868822000             -
# 4      4      blocked 1.516869e+12         80000             0         80000             0
# 5      5 updated_date 1.516869e+12 1516868822350 1516866877815 1516868822350 1516866877815
# 6      6         list 1.516869e+12           BCD             -           BCD             -
# 7      7   sub_stat_1 1.516869e+12         AC-12         AC-14             Y            YY
# 8      8     status_1 1.516869e+12         AC-25         AC-38             A            CC

Using this data:

DF_1
#   Sr.No.  Stage         Time        Result      Result_2       Final_1       Final_2
# 1  updated_date 1.516869e+12 1516868822361 1516868822350 1516868822361 1516868822350
# 2            id 1.516869e+12           ABC             -           ABC             -
# 3   engine_date 1.516869e+12 1516868822000             - 1516868822000             -
# 4       blocked 1.516869e+12         80000             0         80000             0
# 5  updated_date 1.516869e+12 1516868822350 1516866877815 1516868822350 1516866877815
# 6          list 1.516869e+12           BCD             -           BCD             -
# 7    sub_stat_1 1.516869e+12         AC-12         AC-14             Y            YY
# 8      status_1 1.516869e+12         AC-25         AC-38             A            CC

Using this data:

DF_1 = read.table(text = "Sr.No.  Stage       Time            Result      

    Result_2
    1   updated_date    1516868822411   1516868822361   1516868822350
    2   id              1516868822411   ABC             -
    3   engine_date     1516868822411   1516868822000   -
    4   blocked         1516868822411   80000           0
    5   updated_date    1516868822398   1516868822350   1516866877815
    6   list            1516868822398   BCD             -
    7   sub_stat_1      1516868779095   AC-12           AC-14
    8   status_1        1516868642468   AC-25           AC-38", check.names = F, stringsAsFactors = FALSE, header = T)

DF_2 = read.table(text = "Sr.No. ID     Type_1 Type_2
1   AC-12      X      Y
2   AC-14      XX     YY
3   AC-25      A      B
4   AC-38      CC     CD", check.names = F, stringsAsFactors = FALSE, header = T)

Upvotes: 1

Related Questions