youraz
youraz

Reputation: 483

How to find specific strings in dataframe using for loop?

I'm using for loop to find all specific strings (df2$x2) in another dataframe (df1$x1) and what my purpose is create new column the df1$test and write the df$x2 value.

For example:

df1 <- data.frame(x1 = c("TE-T6-3 XYZ12X","TE-D31L-2 QWE12X","TE-H6-1 ABC12X","TE-D31L-2 QWE12X","EC20 QWX12X"),
                  Y = c(2017,2017,2018,2018,2017),
                  Sales = c(25,50,30,40,90))
df1$x1 <- as.character(as.factor(df1$x1))

df2 <- data.frame(x2 = c("TE-T6-5","TE-D31L-2","TE-H6-15","EC500","EC20","TE-D31L-2"),
                  Y = c(2018,2017,2018,2017,2018,2018),
                  P = c(100,300,200,50,150,300))
df2$x2 <- as.character(as.factor(df2$x2))

for(i in 1:nrow(df2)){

  f <- df2[i,1]

  df1$test <- ifelse(grepl(f, df1$x1),f,"not found")

}

What should I do after the end of loop? I know that problem is y is refreshing every time. I tried "if" statement to create new data frame and save outputs but it didn't work. It's writing only one specific string.

Thank you in advance.

Expected output:

df1 <- data.frame(x1 = c("TE-T6-3 XYZ12X","TE-D31L-2 QWE12X","TE-H6-1 ABC12X","TE-D31L-2 QWE12X","EC20 QWX12X"),
             output = c("not found","TE-D31L-2","not found","TE-D31L-2","EC20"))

enter image description here

Upvotes: 0

Views: 73

Answers (2)

Leonel Esteban Bracco
Leonel Esteban Bracco

Reputation: 21

Do you want to have one new column for each string? if that is what you need, your code should be:

df1 <- data.frame(x1 = c("TE-T6-3 XYZ12X","TE-D31L-2 QWE12X","TE-H6-1 ABC12X","TE-D31L-2 QWE12X","EC20 QWX12X"),
                  Y = c(2017,2017,2018,2018,2017),
                  Sales = c(25,50,30,40,90))
df1$x1 <- as.character(as.factor(df1$x1))

df2 <- data.frame(x2 = c("TE-T6-5","TE-D31L-2","TE-H6-15","EC500","EC20","TE-D31L-2"),
                  Y = c(2018,2017,2018,2017,2018,2018),
                  P = c(100,300,200,50,150,300))
df2$x2 <- as.character(as.factor(df2$x2))

for(i in 1:nrow(df2)){

  f <- df2[i,1]
  df1$test <- ""
  df1$test<-ifelse(grepl(f, df1$x1),T,F)
  colnames(df1) <- c(colnames(df1[1:length(df1[1,])-1]),f)

}

it creates a new column with a temp name and then rename it with the string evaluated. Also i change "not found" for F, but you can use whatever you want.

[EDIT:] If you want that expected output, you can use this code:

df1 <- data.frame(x1 = c("TE-T6-3 XYZ12X","TE-D31L-2 QWE12X","TE-H6-1 ABC12X","TE-D31L-2 QWE12X","EC20 QWX12X"),
                  Y = c(2017,2017,2018,2018,2017),
                  Sales = c(25,50,30,40,90))
df1$x1 <- as.character(as.factor(df1$x1))

df2 <- data.frame(x2 = c("TE-T6-5","TE-D31L-2","TE-H6-15","EC500","EC20","TE-D31L-2"),
                  Y = c(2018,2017,2018,2017,2018,2018),
                  P = c(100,300,200,50,150,300))
df2$x2 <- as.character(as.factor(df2$x2))
df1$output <- "not found"

for(i in 1:nrow(df2)){
  f <- df2[i,1]
  df1$output[grepl(f, df1$x1)]<-f

}

Very similar of what you have done, but it was needed to index which rows you have to write. This only works when the data only can have one match, it is a little more complicated if you can have more than one match for row. But i think that's not your problem.

Upvotes: 1

Sotos
Sotos

Reputation: 51592

You simply need to split the df1$x1 strings on space and merge (or match since you are only interested in one variable)on df2$x2, i.e.

v1 <- sub('\\s+.*', '', df1$x1)
v1[match(v1, df2$x2)]
#[1] NA          "TE-D31L-2" NA          "TE-D31L-2" "EC20"

Upvotes: 0

Related Questions