Reputation: 13
I have a dataframe:
df <- data.frame(id = as.integer(integer()),
points = as.integer(integer()),
row.names = 1,
stringsAsFactors = FALSE)
When adding IDs, if given ID already exists their points are set to predefined constant max_points, otherwise if given ID does not exist it's created:
IDs <- c(1,2,3,20,30,55) # assume these values has been generated
df[IDs, ] <- max_points
If points in some rows reach zero, the rows are removed:
df <- subset(df, points > 0)
However after certain rows are deleted and later at some point new value is added back to their place, the duplicate row.names error shows up:
> df
points
7 2
8 2
13 2
14 2
15 2
16 2
17 2
18 2
> df[13, ] <- 13
> df
Error in data.frame(points = c(" 2", " 2", " 2", " 2", " 2", " 2", " 2", :
duplicate row.names: 13
Upon futher inspection the new dataframe looks like this:
points
7 2
8 2
13 2
14 2
15 2
16 2
17 2
18 2
9 NA
10 NA
11 NA
12 NA
13 13
Why does it behave this way? Is there any way around this?
EDIT
To reproduce the problem here's a code snippet:
IDs <- c(13,14,15,8,16,17,18,7)
df <- data.frame(ID = as.integer(integer()),
points = as.integer(integer()),
row.names = 1,
stringsAsFactors = FALSE)
df[IDs, ] <- 2
df <- subset(df, points > 0)
df[13, ] <- 13
Upvotes: 1
Views: 31
Reputation: 161
I guess, the problem arises in this line
df[13,] <- 13
Here, you are assigning a value to the thirteenth row. Since your df has less rows, additional NA rows are created in between. I think, you wanted to assign the value to the row named "13", hence
df["13",] <- 13
Upvotes: 1