Reputation: 2870
I subset a dataset and this results in a dataframe with non-interger row name. Could you please the reason behind this phenomenon?
library(outbreaks)
df <- measles_hagelloch_1861[, 3, drop = FALSE]
df$disease <- 1
index <- sample(1:50, 50, replace = TRUE, prob = NULL)
syn_df <- df[index, ]
The result is
Upvotes: 1
Views: 293
Reputation: 145775
When you sample with replacement, you end up with duplicate row names (you sample the same row more than once). Row names must be unique, so the .1
is added to make them unique.
A simple example, repeating the first row of the iris
dataset.
iris[1, ]
# Sepal.Length Sepal.Width Petal.Length Petal.Width Species
# 1 5.1 3.5 1.4 0.2 setosa
iris[c(1, 1), ]
# Sepal.Length Sepal.Width Petal.Length Petal.Width Species
# 1 5.1 3.5 1.4 0.2 setosa
# 1.1 5.1 3.5 1.4 0.2 setosa
iris[c(1, 1, 1),]
# Sepal.Length Sepal.Width Petal.Length Petal.Width Species
# 1 5.1 3.5 1.4 0.2 setosa
# 1.1 5.1 3.5 1.4 0.2 setosa
# 1.2 5.1 3.5 1.4 0.2 setosa
Generally, I'd suggest against relying on row names for anything... if you want to track observations, add some sort of ID column.
Upvotes: 2