Rollo99
Rollo99

Reputation: 1613

Sampling with row number in R

Let's say I have a much bigger dataset than the one below:

df = data.frame(x = c("ciao mondo", "hello world", "ciao world","hello mondo","bye mondo","ciao ciao mondo"))

I want to sample randomly and without replacement a few rows and so I do:

sample(df$x,size = 3, replace = F)

The issue with that is that I no longer have the original row index of the sampled rows. My dataset is quite big so using anything like grepl() to retrieve the original row indices is inefficient.

Do you have any idea on how to do it?

Thanks a lot!

Upvotes: 0

Views: 444

Answers (2)

Jon Spring
Jon Spring

Reputation: 66480

You could make the row number into a column, and then sample rows from that data frame.

df$row = 1:nrow(df)
df[sample(nrow(df), 3, replace = F),]

result after set.seed(0)

               x row
6 ciao ciao mondo   6
1      ciao mondo   1
4     hello mondo   4

Upvotes: 1

akrun
akrun

Reputation: 887108

Instead of sampling on the column, do the sample on the sequence of rows, thus it will return the row index which can be later used for subsetting the rows

i1 <- sample(seq_len(nrow(df)), size = 3, replace = FALSE)

Upvotes: 1

Related Questions