Sumit Dhar
Sumit Dhar

Reputation: 1

Need a index column in R data frame to differentiate variables with same value

Newbie question I have 2 columns in a data frame that looks like

Name Size
 A     1
 A     1
 A     1
 A     2
 A     2
 B     3
 B     5
 C     7
 C     17
 C     17

I need a third column that will run continuously as a sequence until either Name or Size changes value

    Name Size NewCol
    A     1   1
    A     1   2
    A     1   3
    A     2   1
    A     2   2
    B     3   1
    B     5   1
    C     7   1
    C     17  1
    C     17  2

Basically a dummy field to reference each record separately even if Name and Size are the same.

So the index changes from k to k+1 when it encounters both same values for Name and Size otherwise resets.

Therefore in my data set if I have 200 A and 1s suppose each will be indexed between 1..200. Then when it moves to A and 2 the index shall reset

Upvotes: 0

Views: 1006

Answers (2)

ClementWalter
ClementWalter

Reputation: 5272

I guess this might not be the most efficient solution, but at least a good start :

# Reproducing the example
df <- data.frame(Name=LETTERS[c(1, 1, 1, 1, 1, 2, 2, 3, 3, 3)], Size=c(1, 1, 1, 2, 2, 3, 5, 7, 17, 17))

# Create new colum with unique id
df$NewCol <- paste0(df$Name, df$Size)

# Modify column to write count instead
df$NewCol <- unlist(sapply(unique(df$NewCol), function(id) 1:table(df$NewCol)[id]))

df
   Name Size NewCol
1     A    1      1
2     A    1      2
3     A    1      3
4     A    2      1
5     A    2      2
6     B    3      1
7     B    5      1
8     C    7      1
9     C   17      1
10    C   17      2

Upvotes: 0

akrun
akrun

Reputation: 887118

We can try with data.table

library(data.table)
setDT(df1)[, NewCol := match(Size, unique(Size)), by = .(Name)]
df1
#   Name Size NewCol
#1:    A    1      1
#2:    A    1      1
#3:    A    2      2
#4:    B    3      1
#5:    C    7      1
#6:    C   17      2

If there is a typo somewhere in the expected output, may be this would be the output

setDT(df1)[, NewCol := seq_len(.N), .(Name, Size)]

Or using dplyr

library(dplyr)
df1 %>%
   group_by(Name) %>%
   mutate(NewCol = match(Size, unique(Size)))

Or

df1 %>%
   group_by(Name) %>%
   mutate(NewCol = row_number())

Or we can use the same approach with ave from base R

Upvotes: 1

Related Questions