Akira
Akira

Reputation: 2870

The row name is a decimal number

I subset a dataset and this results in a dataframe with non-interger row name. Could you please the reason behind this phenomenon?

library(outbreaks)
df <- measles_hagelloch_1861[, 3, drop = FALSE]
df$disease <- 1
index <- sample(1:50, 50, replace = TRUE, prob = NULL)
syn_df <- df[index, ]

The result is

enter image description here

Upvotes: 1

Views: 293

Answers (1)

Gregor Thomas
Gregor Thomas

Reputation: 145775

When you sample with replacement, you end up with duplicate row names (you sample the same row more than once). Row names must be unique, so the .1 is added to make them unique.

A simple example, repeating the first row of the iris dataset.

iris[1, ]
#   Sepal.Length Sepal.Width Petal.Length Petal.Width Species
# 1          5.1         3.5          1.4         0.2  setosa

iris[c(1, 1), ]
#     Sepal.Length Sepal.Width Petal.Length Petal.Width Species
# 1            5.1         3.5          1.4         0.2  setosa
# 1.1          5.1         3.5          1.4         0.2  setosa

iris[c(1, 1, 1),]
#     Sepal.Length Sepal.Width Petal.Length Petal.Width Species
# 1            5.1         3.5          1.4         0.2  setosa
# 1.1          5.1         3.5          1.4         0.2  setosa
# 1.2          5.1         3.5          1.4         0.2  setosa

Generally, I'd suggest against relying on row names for anything... if you want to track observations, add some sort of ID column.

Upvotes: 2

Related Questions