Reputation: 125
I am struggling with a subset function and hope that I can get some help from the cloud.
Within my dataset surveydata
one can find the column landscape
. I need to both select all landscapes of type 7
and 5
and to randomly select 50
objects from each landscape type 3
and 6
. Then I want to create a new variable called sub
in the surveydata
dataframe which should contain a number e.g. 1
if the object was selected in the previous selection and 0
(or NA
) if it wasn't.
Preferably I search a base R solution, but I don't stick to that.
I provide a random dataset for better understanding.
#create data
surveydata <- as.data.frame(replicate(6,sample(0:1,1000,rep=TRUE)))
# change values of columns
surveydata$V3 <- sample(7, size = nrow(surveydata), replace = TRUE)
surveydata$V4 <- sample(5, size = nrow(surveydata), replace = TRUE)
surveydata$V5 <- sample(5, size = nrow(surveydata), replace = TRUE)
surveydata$V6 <- sample(5, size = nrow(surveydata), replace = TRUE)
#create column with same distribution of values
surveydata$group <- c(1,2)
# rename columns
colnames(surveydata)[1] <- "gender"
colnames(surveydata)[2] <- "expert"
colnames(surveydata)[3] <- "landscape"
colnames(surveydata)[4] <- "q1"
colnames(surveydata)[5] <- "q2"
colnames(surveydata)[6] <- "q3"
Upvotes: 0
Views: 424
Reputation: 174478
Here's an R method which uses sampling and indexing to achieve the results:
# Sample index of rows where landscape is 3 or 6
ss <- sample(with(surveydata, which(landscape == 6 | landscape == 3)), 50, FALSE)
# Append index of all rows where landscape is 5 or 7
ss <- c(ss, with(surveydata, which(landscape == 5 | landscape == 7)))
# Create subset data frame
subset <- surveydata[ss, ]
# Create sub column to show which rows have been sampled
surveydata$sub <- numeric(nrow(surveydata))
surveydata$sub[ss] <- 1
# test result of creating sub column
head(surveydata)
#> gender expert landscape q1 q2 q3 group sub
#> 1 0 1 7 1 5 3 1 1
#> 2 1 1 5 2 2 3 2 1
#> 3 0 0 4 5 5 2 1 0
#> 4 0 0 3 5 5 4 2 0
#> 5 0 1 7 1 5 1 1 1
#> 6 1 0 7 5 1 1 2 1
# ensure subsetted data frame is as expected
head(subset)
#> gender expert landscape q1 q2 q3 group
#> 348 0 0 6 5 4 2 2
#> 333 1 1 6 4 2 4 1
#> 521 1 0 6 1 5 5 1
#> 522 1 0 6 4 5 2 2
#> 563 0 1 6 2 4 2 1
#> 13 0 0 6 5 2 4 1
Created on 2020-07-08 by the reprex package (v0.3.0)
Upvotes: 1