Reputation: 21
I have a question to which I can't find an answer on this forum. I have been trying to filter a column from my data set in R. I am using a data set with 7321 rows and 28 columns. One column, which is about the type of businesses, has four different variables: Wirtschaft, Hochschule, außeruniversitäre Forschung and Sonstige. I would like to combine Wirtschaft and außeruniversitäre Forschung in a new column called private, and filter the Hochschule variable into a new column called public. Now, I have tried the following:
First I tried creating a subset in which both 'private variables' are included:
subdataprivate <- subset(data, typ == "außeruniversitäre Forschung" & typ == "Wirtschaft")
The problem here is, that I get a subset with 0 observations for 28 variables. When I run the codes seperately, like this:
subdataprivate1 <- subset(data, typ == "außeruniversitäre Forschung")
subdataprivate2 <- subset(data, typ == "Wirtschaft")
I do get sufficient observations for all variables (1559 observations of 28 variables and 3548 observations of 28 variables). The thing is, that I need to have these types of businesses combined to run my analyses.
The same problem occurs when I try filtering the data using the dplyr package. Could anyone please tell me what I am doing wrong? I'm rather new to R and this forum, so I apologise in advance for my layman-way of asking this question.
Upvotes: 1
Views: 1035
Reputation: 4338
Without posting your data I have to use dummy data, but I'd do something like this using mutate
and if_else
. Once you have this column you can then use subsetting like you have with base R or use filter
from dplyr
.
library(tidyverse)
data <- tibble(school = rep(c("school 1", "school 2", "school 3", "school 4"), 5))
data_transformed <- data %>%
mutate(private_public = if_else(school == "school 1" | school == "school 2",
"private",
"public"))
Upvotes: 1