Reputation: 31
I have two tables, say one is Sales History, the other is Key Clients.
The Sales History table looks like this(contains 500,000 rows):
Client No. Transaction Date Sales Amount
abc 1/12/2014 100
def 2/28/2014 200
hij 3/01/2014 300
abc 2/18/2014 400
abc 5/26/2014 500
xyz 7/15/2014 600
def 8/23/2014 700
hij 9/19/2014 800
And the Key Clients table looks like this:
Client No.
abc
def
xyz
Now what I want to do with R is that, based on the Key Clients table, I need to create a new table matching the earliest Sales Amount from the Sales History table. The desired results looks like this:
Client No. Earliest Sales Amount
abc 100
def 200
xyz 600
Upvotes: 3
Views: 867
Reputation: 1279
How about, using 'dplyr', something like the following? (not tested, using some easier-to-type labels)
sal_hist_first <- sales_hist %>%
mutate(trans_date=as.Date(trans_date, format="%m/%d/%Y")) %>%
group_by(client_no) %>%
summarise(earliest_sales_amount = sales_amount[trans_date == min(trans_date)])
result <- keys_clients %>% left_join(sal_hist_first)
Upvotes: 2
Reputation: 887118
If the 'Transaction Date' column is ordered, then we can use match
df2$EarliestSalesAmount <- df1[,3][match(df2[,1], df1[,1])]
Or we can use data.table
to join both the datasets after setting the key as 'Client_No.'. order
using 'Transaction_Date' after converting to 'Date' class, select the first row (.SD[1L]
) by 'Client_No', assign the unwanted columns to NULL
, and change the column names accordingly (setnames
)
library(data.table)
res <- setkey(setDT(df1), Client_No.)[df2][order(as.Date(Transaction_Date,
format='%m/%d/%Y')),.SD[1L] ,Client_No.][, 2:= NULL]
setnames(res, 2, 'Earliest_Sales_Amount')
# Client_No. Earliest_Sales_Amount
#1: abc 100
#2: def 200
#3: xyz 600
df1 <- structure(list(Client_No. = c("abc", "def", "hij", "abc",
"abc",
"xyz", "def", "hij"), Transaction_Date = c("1/12/2014", "2/28/2014",
"3/01/2014", "2/18/2014", "5/26/2014", "7/15/2014", "8/23/2014",
"9/19/2014"), Sales_Amount = c(100L, 200L, 300L, 400L, 500L,
600L, 700L, 800L)), .Names = c("Client_No.", "Transaction_Date",
"Sales_Amount"), class = "data.frame", row.names = c(NA, -8L))
df2 <- structure(list(Client_No. = c("abc", "def", "xyz")),
.Names = "Client_No.", class = "data.frame", row.names = c(NA, -3L))
Upvotes: 3
Reputation: 92292
I will also recommend using data.table
s binary join, but I will utilize .EACHI
for this
library(data.table)
setkey(setDT(df)[, Transaction.Date := as.IDate(Transaction.Date, "%m/%d/%Y")], Client.No.)
df[key, .(Earliest.Sales.Amont = Sales.Amount[which.min(Transaction.Date)]), by = .EACHI]
# Client.No. Earliest.Sales.Amont
# 1: abc 100
# 2: def 200
# 3: xyz 600
First, we will convert your data set (df
) to a data.table
class using setDT
, convert Transaction.Date
to a Date
class so R will understand what you mean while looking for the minimum date and then key it by Client.No.
in order to perform a binary join (this is very efficient for a big data set like yours). Then, while performing the binary join on the key table (key
) we will use by = .EACHI
in order to locate the Sale.Amount
within the earliest date in the data set.
Upvotes: 3
Reputation: 328
To give a more detailed answer it would be useful to know the used programming language/IDE and exact database system.
But assuming you use something like MySQL, MSSQL or PostgreSQL and you are able to execute SQL statements the following should give you an idea what to do.
First of all you could open the Sales History Table orderd by the Transaction date. Then you could iterate through it to find the respective Key clients. Or if your Database allows the TOP statement you could use a kind of the following SQL Statement for each Key Client:
SELECT TOP 1 Client_No, Sales_Amount FROM tbl_sales_history WHERE Client_No = '<KEY CLIENT>' ORDER BY Transaction_Date
This SQL Statement would return the first record for a key client of all the records ordered by the transaction date. Which is of course the earliest sale.
In short key to you question is to order your sales history by the transaction date.
EDIT: Just saw that the programming language used is R. So my answer is probably not that useful for this example, but it should work as a general help.
Upvotes: 2