Reputation: 21
I am very new to r an programming and have a basic question (my first one on stackoverflow :) ) I want to delete some rows from a data.frame and use an if-statement on that account. My code is running but it is unfortunately not deleting the correct rows but instead every second row of my dataframe I think.
"myDataVergleich" is the name of my data.frame. "myData$QUESTNNR" is the column by which is decided whether the row is supposed to stay in the dataframe or not.
for(i in 1:nrow(myDataVergleich))
{if(myData$QUESTNNR[i] != "t0_mathe" | myData$QUESTNNR[i] != "t0_bio" | myData$QUESTNNR[i] != "t0_allg2" |
myData$QUESTNNR[i] != "t7_mathe_Version1" | myData$QUESTNNR[i] != "t7_bio_Version1")
{myDataVergleich <- myDataVergleich[-c(i),] }}
What am I doing wrong?
Upvotes: 0
Views: 1394
Reputation: 171
I would have to know the error, QUESTNNR %in% strings returns a TRUE or FALSE and adding the ! just returns the opposite, so that should word fine. You can detect part of a string with str_detect from the 'stringr' package.
library(dplyr)
library(stringr)
library(tibble)
library(magrittr)
df <- tibble(x = c('h', 'e', 'l', 'l', '0'))
df %>% dplyr::filter(str_detect(x, 'l'))
Upvotes: 0
Reputation: 171
Welcome to stack overflow and to R. I think your intuition is correct but there are some issues. First, you say your data is called 'myDataVergleich' but inside your loop you are accessing 'myData'. So you might need to change 'myData$QUESTNNR[i]' to 'myDataVergleich$QUESTNNR[i]' in the loop.
A great thing about R is that there are solutions people have figured out already for many problems, sub-setting a data frame by a condition is one of them. You should look into the tidyverse family of packages, especially dplyr in this case.
install.packages('dplyr')
install.packages('magrittr')
If you want to keep the rows with these strings this code will work
library(dplyr)
library(magrittr)
strings <- c(
"t0_mathe", "t0_bio", "t0_allg2", "t7_mathe_Version1", "t7_bio_Version1"
)
filtered_data <- myDataVergleich %>%
dplyr::filter(QUESTNNR %in% strings)
If you want to keep the rows that don't contain these strings this code will work
library(dplyr)
library(magrittr)
strings <- c(
"t0_mathe", "t0_bio", "t0_allg2", "t7_mathe_Version1", "t7_bio_Version1"
)
filtered_data <- myDataVergleich %>%
dplyr::filter(!QUESTNNR %in% strings)
Hope that helps
Upvotes: 1