The row name is a decimal number

Question

I subset a dataset and this results in a dataframe with non-interger row name. Could you please the reason behind this phenomenon?

library(outbreaks)
df <- measles_hagelloch_1861[, 3, drop = FALSE]
df$disease <- 1
index <- sample(1:50, 50, replace = TRUE, prob = NULL)
syn_df <- df[index, ]

The result is

Gregor Thomas · Accepted Answer

When you sample with replacement, you end up with duplicate row names (you sample the same row more than once). Row names must be unique, so the .1 is added to make them unique.

A simple example, repeating the first row of the iris dataset.

iris[1, ]
#   Sepal.Length Sepal.Width Petal.Length Petal.Width Species
# 1          5.1         3.5          1.4         0.2  setosa

iris[c(1, 1), ]
#     Sepal.Length Sepal.Width Petal.Length Petal.Width Species
# 1            5.1         3.5          1.4         0.2  setosa
# 1.1          5.1         3.5          1.4         0.2  setosa

iris[c(1, 1, 1),]
#     Sepal.Length Sepal.Width Petal.Length Petal.Width Species
# 1            5.1         3.5          1.4         0.2  setosa
# 1.1          5.1         3.5          1.4         0.2  setosa
# 1.2          5.1         3.5          1.4         0.2  setosa

Generally, I'd suggest against relying on row names for anything... if you want to track observations, add some sort of ID column.

The row name is a decimal number

Answers (1)

Related Questions