Merging two data.table while filtering for unique ID: only NA as answers

Question

My problem is as follows: I need to analyse data from several different files with a lot of entries (up to 500.000 per column, 10 columns in total). The files are connected through the use of IDs, e.g. ORDER_IDs. However, the IDs can appear multiple times, e.g. when an order contains multiple order lines. It is also possible, that one ID doesn't appear in one of the files, e.g. because a file with sales data does only have information on the orders shipped, but not those that have not been shippet yet.

So I have different files with different lengths and unique IDs identifying positions that can vary in their appearance (It is there or not) over all the data files. What I want now is to filter one file by ID so that it only shows the IDs listed in another file. Also, the additional columns from the first file should be moved over.

Example of what I have: dt1:

ORDER_ID        SKU_ID          Quantity_Shipped
12345            678910           100
12346            648392            30
64739            648392            20

dt2:

ORDER_ID        Country
12345              DE
12346              DE
55430              SE
90632              JPN
76543              ARG
64739              CH

What I want:

ORDER_ID        SKU_ID          Quantity_Shipped     Country
12345            678910           100                 DE
12346            648392            30                 DE
64739            648392            20                 CH

Originally, the data was imported from a csv file. The approach I used so far has worked when merging two files. When trying to add the information from a third file hoewever, I get only NA as answers. What can I do to fix this?

This is the approach I used so far.

df2 <- data.frame(ORDER_ID = sales[["ORDER_ID"]])
df1 <- data.frame(ORDER_ID = OL[["ORDER_ID"]], SKU_ID = OL[["SKU_ID"]], 
QTY_SHIPPED = OL[["QTY_SHIPPED"]], EXPECTED_VOLUME = 
              OL[["EXPECTED_VOLUME"]])
library(data.table)
dt2 <- data.table(df1)
dt1 <- data.table(df2)

dt3 <- dt2[match(dt1$ORDER_ID, dt2$ORDER_ID), ]

Merging two data.table while filtering for unique ID: only NA as answers

Answers (1)

Related Questions