Reputation: 1
Newbie question I have 2 columns in a data frame that looks like
Name Size
A 1
A 1
A 1
A 2
A 2
B 3
B 5
C 7
C 17
C 17
I need a third column that will run continuously as a sequence until either Name or Size changes value
Name Size NewCol
A 1 1
A 1 2
A 1 3
A 2 1
A 2 2
B 3 1
B 5 1
C 7 1
C 17 1
C 17 2
Basically a dummy field to reference each record separately even if Name and Size are the same.
So the index changes from k to k+1 when it encounters both same values for Name and Size otherwise resets.
Therefore in my data set if I have 200 A and 1s suppose each will be indexed between 1..200. Then when it moves to A and 2 the index shall reset
Upvotes: 0
Views: 1006
Reputation: 5272
I guess this might not be the most efficient solution, but at least a good start :
# Reproducing the example
df <- data.frame(Name=LETTERS[c(1, 1, 1, 1, 1, 2, 2, 3, 3, 3)], Size=c(1, 1, 1, 2, 2, 3, 5, 7, 17, 17))
# Create new colum with unique id
df$NewCol <- paste0(df$Name, df$Size)
# Modify column to write count instead
df$NewCol <- unlist(sapply(unique(df$NewCol), function(id) 1:table(df$NewCol)[id]))
df
Name Size NewCol
1 A 1 1
2 A 1 2
3 A 1 3
4 A 2 1
5 A 2 2
6 B 3 1
7 B 5 1
8 C 7 1
9 C 17 1
10 C 17 2
Upvotes: 0
Reputation: 887118
We can try with data.table
library(data.table)
setDT(df1)[, NewCol := match(Size, unique(Size)), by = .(Name)]
df1
# Name Size NewCol
#1: A 1 1
#2: A 1 1
#3: A 2 2
#4: B 3 1
#5: C 7 1
#6: C 17 2
If there is a typo somewhere in the expected output, may be this would be the output
setDT(df1)[, NewCol := seq_len(.N), .(Name, Size)]
Or using dplyr
library(dplyr)
df1 %>%
group_by(Name) %>%
mutate(NewCol = match(Size, unique(Size)))
Or
df1 %>%
group_by(Name) %>%
mutate(NewCol = row_number())
Or we can use the same approach with ave
from base R
Upvotes: 1