Reputation: 11657
Let's say I have the following dataframe:
personid date measurement
1 x 23
1 x 32
2 y 21
3 x 23
3 z 23
3 y 23
I want to sort this dataframe by the measurement column, and then create a new column that is a sequence along the sorted measurement column, like so:
personid date measurement id
1 x 23 2
1 x 32 3
2 y 21 1
3 x 23 2
3 z 23 2
3 y 23 2
My first instinct was to do something like:
unique_measurements <- data.frame(unique(sort(df$measurement)))
unique_dates$counter <- 1:nrow(unique_dates)
Now I basically have a data-frame that represents a mapping from a given measurement to the correct counter. I recognize this is the wrong way of doing this, but (1) how would I actually use this mapping to achieve my goals; (2) what is the right way of doing this?
Upvotes: 2
Views: 457
Reputation: 145755
Using factor
as an intermediate:
df$id = as.integer(factor(df$measurement))
If you want to use your method, just use merge
(though it might mess up the row order, use dplyr::left_join
or data.table::merge
instead to preserve row order in the original).
unique_measurements <- data.frame(measurement = sort(unique(df$measurement)))
unique_dates$id <- 1:nrow(unique_dates)
merge(df, unique_dates)
Upvotes: 2
Reputation: 5215
Here's a simpler way to do this:
df$id <- match(df$measurement, sort(unique(df$measurement)))
# personid date measurement id
# 1 1 x 23 2
# 2 1 x 32 3
# 3 2 y 21 1
# 4 3 x 23 2
# 5 3 z 23 2
# 6 3 y 23 2
Upvotes: 4