Reputation: 417
I'm trying to cut a data to do a frequency distribution but after cut, all the data is assingned to one interval
points <- 224 * 0:5
cut_data <- cut(rs$amount, points ,dig.lab = 10)
My rs$amount data:
integer64
[1] 517 200 391 186 262 1020 791 124 437 238 896 212 144 529 523 190
And I get something like this
> cut_data
[1] (0,224] (0,224] (0,224] (0,224] (0,224] (0,224] (0,224] (0,224] (0,224] (0,224] (0,224] (0,224] (0,224] (0,224]
[15] (0,224] (0,224]
Levels: (0,224] (224,448] (448,672] (672,896] (896,1120]
What do I do wrong
EDIT:
result of dput() on rs$amount
structure(c(2.55431938899924e-321, 9.88131291682493e-322, 1.93179667523927e-321,
9.18962101264719e-322, 1.29445199210407e-321, 5.03946958758071e-321,
3.90805925860426e-321, 6.12641400843146e-322, 2.15906687232625e-321,
1.17587623710217e-321, 4.42682818673757e-321, 1.04741916918344e-321,
7.11454530011395e-322, 2.61360726650019e-321, 2.58396332774972e-321,
9.38724727098368e-322), class = "integer64")
EDIT2:
Casting rs$amount as numeric helped with the issue
cut_data <- cut(as.numeric(rs$amount),points,dig.lab = 10)
Upvotes: 1
Views: 227
Reputation: 161085
I think you have two alternatives: use cut(as.numeric(vec),...)
or findInterval
.
If you are not concerned about hitting the theoretical precision loss when converting to integer64
to numeric
(it might be hard to find this happening), then you can convert to numeric
:
cut(as.numeric(vec), points ,dig.lab = 10)
# [1] (448,672] (0,224] (224,448] (0,224] (224,448] (896,1120] (672,896] (0,224] (224,448] (224,448] (672,896] (0,224] (0,224] (448,672] (448,672] (0,224]
# Levels: (0,224] (224,448] (448,672] (672,896] (896,1120]
table(cut(vec, points ,dig.lab = 10))
# (0,224] (224,448] (448,672] (672,896] (896,1120]
# 16 0 0 0 0
table(findInterval(vec, points))
# 1 2 3 4 5
# 6 4 3 1 2
You can mock this to produce similarly-formatted factors manually:
labels <- sprintf("(%i,%i]", points[-length(points)], points[-1])
labels
# [1] "(0,224]" "(224,448]" "(448,672]" "(672,896]" "(896,1120]"
factor(labels[findInterval(vec, points)], labels = labels)
# [1] (448,672] (0,224] (224,448] (0,224] (224,448] (896,1120] (672,896] (0,224] (224,448] (224,448] (896,1120] (0,224] (0,224] (448,672] (448,672] (0,224]
# Levels: (0,224] (224,448] (448,672] (672,896] (896,1120]
Data
vec <- structure(c(2.55431938899924e-321, 9.88131291682493e-322, 1.93179667523927e-321, 9.18962101264719e-322, 1.29445199210407e-321, 5.03946958758071e-321, 3.90805925860426e-321, 6.12641400843146e-322, 2.15906687232625e-321, 1.17587623710217e-321, 4.42682818673757e-321, 1.04741916918344e-321, 7.11454530011395e-322, 2.61360726650019e-321, 2.58396332774972e-321, 9.38724727098368e-322), class = "integer64")
Upvotes: 3