Reputation: 137
I have two data frames. One contains the data I am trying to clean/modify(df_x) and one is a lookup table(df_y). There is a column df_x$TEXT that contains a string like "Some text - with more" and the lookup table df_y looks like this:
SORT ABB
-------------- ----
Some Text ST
I want to see if a value in df_y$SORT is in the df_x$TEXT for every row of df_x. If there is a match then take the df_y$ABB value at that matched row and add it to a new column in df_x like df_x$TEXT_ABB.
For the information above the algorithm would see that "Some Text" is in "Some text - with more" (ignoring case) so it would add the value "ST" to the column df_x$TEXT_ABB.
I know I can use match and or a combination of sapply and grep to search if it exists but I can not figure how to do this AND grab the abbreviation I would like to map it back to a new column in the original dataframe.
Upvotes: 1
Views: 73
Reputation: 12569
you can try this:
df_x <- data.frame(TEXT=c("Some Text 001", "other text", "Some Text 002"))
df_y <- read.table(header=TRUE, text=
'SORT ABB
"Some Text" ST
"Other Text" OT')
L <- sapply(df_y$SORT, grep, x=df_x$TEXT, ignore.case=TRUE)
df_x$abb <- NA
for (l in 1:length(L)) if (length(L[[l]])!=0) df_x$abb[L[[l]]] <- as.character(df_y$ABB[l])
Upvotes: 1