Reputation: 739
I just discovered that summarizing a data table (I am relatively new to r and data.table) doesn't result in a sorted data tabe, nor does a join:
starting data table:
> DailyDataICount
ID week WeekDay Qty
1: 33739 1746 2 9369
2: 33739 1746 3 9981
3: 33739 1746 4 10109
4: 33739 1746 5 9012
5: 33739 1746 6 10387
6: 33739 1746 7 2
7: 33739 1747 1 7264
8: 33739 1747 2 9195
9: 33739 1747 3 8865
10: 33739 1747 4 8613
11: 33739 1747 5 9154
12: 33739 1747 6 9754
13: 33739 1747 7 11565
14: 33739 1748 1 9210
15: 33739 1748 2 8579
16: 33739 1748 3 8406
17: 33739 1748 4 8161
18: 33739 1748 5 8778
19: 33739 1748 6 9884
20: 33739 1748 7 10365
summarization:
> MeanCount <- DailyDataICount[,mean(Qty),by="ID,WeekDay"]
> setnames(MeanCount,"V1","ObjQty")
> MeanCount
ID WeekDay ObjQty
1: 33739 2 9047.667
2: 33739 3 9084.000
3: 33739 4 8961.000
4: 33739 5 8981.333
5: 33739 6 10008.333
6: 33739 7 7310.667
7: 33739 1 8237.000
join:
> Count_SM_FM
ID WeekDay ItemType QtyType
1: 33739 2 FM 995
2: 33739 2 SM 15241
3: 33739 3 FM 1036
4: 33739 3 SM 15387
5: 33739 4 FM 1002
6: 33739 4 SM 15449
7: 33739 5 FM 990
8: 33739 5 SM 15388
9: 33739 6 FM 1344
10: 33739 6 SM 16016
11: 33739 7 SM 13840
12: 33739 1 FM 1015
13: 33739 1 SM 11456
14: 33739 7 FM 1710
> setkey(Count_SM_FM,ID,WeekDay)
> Count_SM_FM <- Count_SM_FM[MeanCount]
>
> Count_SM_FM
ID WeekDay ItemType QtyType ObjQty
1: 33739 2 FM 995 9047.667
2: 33739 2 SM 15241 9047.667
3: 33739 3 FM 1036 9084.000
4: 33739 3 SM 15387 9084.000
5: 33739 4 FM 1002 8961.000
6: 33739 4 SM 15449 8961.000
7: 33739 5 FM 990 8981.333
8: 33739 5 SM 15388 8981.333
9: 33739 6 FM 1344 10008.333
10: 33739 6 SM 16016 10008.333
11: 33739 7 SM 13840 7310.667
12: 33739 7 FM 1710 7310.667
13: 33739 1 FM 1015 8237.000
14: 33739 1 SM 11456 8237.000
is it because it's more efficient?
I found this out because I had to join Count_SM_FM
with a different data.table but using the same key (which I hadn't set again) and I got an error. I will solve the immediate problem by setting the key again. I know this is a bit of philosophical question, not keeping the order probably usually saves time but if in most cases people need to use the results as sorted...
Thanks!
Upvotes: 2
Views: 81
Reputation: 118889
The order of elements in a join of the form x[i]
is determined by i
. In your case, i
is MeanCount
and x
is Count_SM_FM
and i
is not sorted.
To get the result you seek, you should just do:
setkey(Count_SM_FM, ID, WeekDay)
setkey(MeanCount, ID, WeekDay) ## Just add this line
ans = Count_SM_FM[MeanCount]
# ID WeekDay ItemType QtyType ObjQty
# 1: 33739 1 FM 1015 8237.000
# 2: 33739 1 SM 11456 8237.000
# 3: 33739 2 FM 995 9047.667
# 4: 33739 2 SM 15241 9047.667
# 5: 33739 3 FM 1036 9084.000
# 6: 33739 3 SM 15387 9084.000
# 7: 33739 4 FM 1002 8961.000
# 8: 33739 4 SM 15449 8961.000
# 9: 33739 5 FM 990 8981.333
# 10: 33739 5 SM 15388 8981.333
# 11: 33739 6 FM 1344 10008.333
# 12: 33739 6 SM 16016 10008.333
# 13: 33739 7 SM 13840 7310.667
# 14: 33739 7 FM 1710 7310.667
Upvotes: 3