serpiko
serpiko

Reputation: 1

sample function in R

I Have just started learning R using RStudio and I have, perhaps, some basic questions. One of them regards the "sample" function. More specifically, my dataset consists of 402224 observations of 147 variables. My task is to take a sample of 50 observations and then produce a dataframe and so on. But when the function sample is executed y = sample(mydata, 50, replace = TRUE, prob = NULL) the result is a dataset with 40224 observations of 50 variables. That is, the sampling is done at variables and not obesrvations.

Do you have any idea why does it happen? Thank you in advance.

Upvotes: 0

Views: 4495

Answers (3)

Kristofersen
Kristofersen

Reputation: 2806

The other answers people have been giving are to select rows, but it looks like you are after columns. You can still accomplish this in a similar way.

Here's a sample df.

df = data.frame(a = 1:5, b = 6:10, c = 11:15)
> df
  a  b  c
1 1  6 11
2 2  7 12
3 3  8 13
4 4  9 14
5 5 10 15

Then, to randomly select 2 columns and all observations we could do this

> df[ , sample(1:ncol(df), 2)]
   c a
1 11 1
2 12 2
3 13 3
4 14 4
5 15 5

So, what you'll want to do is something like this

y = mydata[ , sample(1:ncol(mydata), 50)]

Upvotes: 1

amonk
amonk

Reputation: 1797

That is because sample accepts only vectors. try the following:

 library(data.table)
 set.seed(10)
 df_sample<- data.table(df)
 df[sample(.N, 402224 )]

Upvotes: 0

fmic_
fmic_

Reputation: 2446

If you want to create a data frame of 50 observations with replacement from your data frame, you can try:

mydata[sample(nrow(mydata), 50, replace=TRUE), ]

Alternatively, you can use the sample_n function from the dplyr package:

sample_n(mydata, 50)

Upvotes: 2

Related Questions