Reputation: 179
I have a two data frames as mentioned below:
DF_1>
Sr.No. Stage Time Result Result_2
1 updated_date 1516868822411 1516868822361 1516868822350
2 id 1516868822411 ABC -
3 engine_date 1516868822411 1516868822000 -
4 blocked 1516868822411 80000 0
5 updated_date 1516868822398 1516868822350 1516866877815
6 list 1516868822398 BCD -
7 sub_stat_1 1516868779095 AC-12 AC-14
8 status_1 1516868642468 AC-25 AC-38
DF_2>
Sr. No. ID Type_1 Type_2
1 AC-12 X Y
2 AC-14 XX YY
3 AC-25 A B
4 AC-38 CC CD
Now i want to vlookup value from DF_2 with below mentioned condition:
Desired Output Dataframe:
Sr. No. Stage Time Result Result_2 Time_2 Final_1 Final_2
1 updated_date 1516868822411 1516868822361 1516868822350 25/01/2018 08:27:02 25/01/2018 08:27:02 25/01/2018 08:27:02
2 id 1516868822411 ABC - 25/01/2018 08:27:02 ABC -
3 engine_date 1516868822411 1516868822000 - 25/01/2018 08:27:02 25/01/2018 08:27:02 -
4 blocked 1516868822411 80000 0 25/01/2018 08:27:02 80000 0
5 updated_date 1516868822398 1516868822350 1516866877815 25/01/2018 08:27:02 25/01/2018 08:27:02 25/01/2018 07:54:38
6 list 1516868822398 BCD - 25/01/2018 08:27:02 BCD -
7 sub_stat_1 1516868779095 AC-12 AC-14 25/01/2018 08:26:19 Y (Output of AC-12) YY (Output of AC-14)
8 status_1 1516868642468 AC-25 AC-38 25/01/2018 08:24:02 A (Output of AC-25) CC (Output of AC-38)
Upvotes: 0
Views: 690
Reputation: 145765
I'm assuming your data frame columns are as printed, and your string columns are character
type, not factor
. Convert them to character
if they are not already. (See sample data at bottom.)
# first we will create empty FINAL columns
DF_1$Final_1 = NA
DF_1$Final_2 = NA
If Stage is sub_stat_1 then vlookup Result and Result_2 from Type_2 (From DF_2)
cond = DF_1[, 'Stage'] == "sub_stat_1"
from = "Type_2"
DF_1[cond, "Final_1"] = DF_2[match(DF_1[cond, "Result"], DF_2[, 'ID']), from]
DF_1[cond, "Final_2"] = DF_2[match(DF_1[cond, "Result_2"], DF_2[, 'ID']), from]
If stage is status_1 then vlookup Result and Result_2 from Type_1 (From DF_2).
cond = DF_1[, 'Stage'] == "status_1"
from = "Type_1"
DF_1[cond, "Final_1"] = DF_2[match(DF_1[cond, "Result"], DF_2[, 'ID']), from]
DF_1[cond, "Final_2"] = DF_2[match(DF_1[cond, "Result_2"], DF_2[, 'ID']), from]
If stage is status_1 or sub_stat_1 but Result or Result_2 is then do nothing and give "-" value in output dataframe.
I we initialized with missing values, NA
. I'd encourage you to leave them as such, but if you really want you could do DF_1[is.na(DF_1)] = "-"
.
keep other values same as DF_1 Result and Result_2 to Desired output column Final_1 and Final_2, Respectively.
cond = ! DF_1[, 'Stage'] %in% c("status_1", "sub_stat_1")
DF_1[cond, "Final_1"] = DF_1[cond, "Result"]
DF_1[cond, "Final_2"] = DF_1[cond, "Result_2"]
Whenever there are epoch time in Time, Result and Result_2 column if possible convert it to normal time in Desired output column Time_2, Final_1 and Final_2, Respectively.
I'll leave this one to you - you can use as.POSIXct()
on your epoch times if you supply the origin, but your integers look too big to me. You may want to format
them before inserting them into the Final columns so you can control what they look like as they get converted to characters. If you need more help with this, ask a separate question.
DF_1
# Sr.No. Stage Time Result Result_2 Final_1 Final_2
# 1 1 updated_date 1.516869e+12 1516868822361 1516868822350 1516868822361 1516868822350
# 2 2 id 1.516869e+12 ABC - ABC -
# 3 3 engine_date 1.516869e+12 1516868822000 - 1516868822000 -
# 4 4 blocked 1.516869e+12 80000 0 80000 0
# 5 5 updated_date 1.516869e+12 1516868822350 1516866877815 1516868822350 1516866877815
# 6 6 list 1.516869e+12 BCD - BCD -
# 7 7 sub_stat_1 1.516869e+12 AC-12 AC-14 Y YY
# 8 8 status_1 1.516869e+12 AC-25 AC-38 A CC
Using this data:
DF_1
# Sr.No. Stage Time Result Result_2 Final_1 Final_2
# 1 updated_date 1.516869e+12 1516868822361 1516868822350 1516868822361 1516868822350
# 2 id 1.516869e+12 ABC - ABC -
# 3 engine_date 1.516869e+12 1516868822000 - 1516868822000 -
# 4 blocked 1.516869e+12 80000 0 80000 0
# 5 updated_date 1.516869e+12 1516868822350 1516866877815 1516868822350 1516866877815
# 6 list 1.516869e+12 BCD - BCD -
# 7 sub_stat_1 1.516869e+12 AC-12 AC-14 Y YY
# 8 status_1 1.516869e+12 AC-25 AC-38 A CC
Using this data:
DF_1 = read.table(text = "Sr.No. Stage Time Result
Result_2
1 updated_date 1516868822411 1516868822361 1516868822350
2 id 1516868822411 ABC -
3 engine_date 1516868822411 1516868822000 -
4 blocked 1516868822411 80000 0
5 updated_date 1516868822398 1516868822350 1516866877815
6 list 1516868822398 BCD -
7 sub_stat_1 1516868779095 AC-12 AC-14
8 status_1 1516868642468 AC-25 AC-38", check.names = F, stringsAsFactors = FALSE, header = T)
DF_2 = read.table(text = "Sr.No. ID Type_1 Type_2
1 AC-12 X Y
2 AC-14 XX YY
3 AC-25 A B
4 AC-38 CC CD", check.names = F, stringsAsFactors = FALSE, header = T)
Upvotes: 1