Reputation: 63
Let's say I have data frame with two variables and 213005 observations, it looks like that:
df <- data.frame(nr=c(233, 233, 232, 231, 234, 234, 205),
date=c("2012/01/02", "2012/01/01", "2012/01/01", "2012/01/02", "2012/01/01", "2012/01/01", "2012/01/05"))
I need to create a new column called "new" for each different "nr" value according to "date" value, it should look like this:
df <- data.frame(nr=c(233, 233, 232, 231, 234, 234, 205),
date=c("2012/01/02", "2012/01/01", "2012/01/01", "2012/01/02",
"2012/01/01", "2012/01/01", "2012/01/05"),
new=c(1, 2, 3, 4, 5, 5, 6))
(nr=233, date=2012/01/02) => (new=1)
(nr=233, date=2012/01/01) => (new=2) ...
for (nr=234, date=2012/01/01) there should be two the same columns with new=5, repeated lines should stay in data frame.
Does anyone knows how to do that? Any help would be very appreciated! Thank you!
Upvotes: 3
Views: 254
Reputation: 887118
Using base R
,
v1 <- do.call(paste, df)
df$new <- as.numeric(factor(v1, levels=unique(v1)))
Upvotes: 1
Reputation: 92292
I'm not entirely sure I understand the logic, but it seems like you want to group by both columns, here's a simple data.table
solution using .GRP
library(data.table)
setDT(df)[, new := .GRP, .(nr, date)][]
# nr date new
# 1: 233 2012/01/02 1
# 2: 233 2012/01/01 2
# 3: 232 2012/01/01 3
# 4: 231 2012/01/02 4
# 5: 234 2012/01/01 5
# 6: 234 2012/01/01 5
# 7: 205 2012/01/05 6
Upvotes: 4