Filip Herring
Filip Herring

Reputation: 13

duplicate row.names after deleting and adding back rows

I have a dataframe:

df <- data.frame(id = as.integer(integer()),
                   points = as.integer(integer()),
                   row.names = 1,
                   stringsAsFactors = FALSE)

When adding IDs, if given ID already exists their points are set to predefined constant max_points, otherwise if given ID does not exist it's created:

IDs <- c(1,2,3,20,30,55)     # assume these values has been generated
df[IDs, ] <- max_points

If points in some rows reach zero, the rows are removed:

df <- subset(df, points > 0)

However after certain rows are deleted and later at some point new value is added back to their place, the duplicate row.names error shows up:

> df
   points
7    2
8    2
13   2
14   2
15   2
16   2
17   2
18   2
> df[13, ] <- 13
> df
Error in data.frame(points = c(" 2", " 2", " 2", " 2", " 2", " 2", " 2",  : 
  duplicate row.names: 13

Upon futher inspection the new dataframe looks like this:

    points
 7    2 
 8    2
13    2
14    2
15    2
16    2
17    2 
18    2
 9   NA
10   NA
11   NA
12   NA
13   13

Why does it behave this way? Is there any way around this?

EDIT

To reproduce the problem here's a code snippet:

IDs <- c(13,14,15,8,16,17,18,7)
df <- data.frame(ID = as.integer(integer()),
                       points = as.integer(integer()),
                       row.names = 1,
                       stringsAsFactors = FALSE)
df[IDs, ] <- 2
df <- subset(df, points > 0)
df[13, ] <- 13

Upvotes: 1

Views: 31

Answers (1)

Arsak
Arsak

Reputation: 161

I guess, the problem arises in this line

df[13,] <- 13

Here, you are assigning a value to the thirteenth row. Since your df has less rows, additional NA rows are created in between. I think, you wanted to assign the value to the row named "13", hence

df["13",] <- 13

Upvotes: 1

Related Questions