Reputation: 109
I have a data frame and would like to classify each row based on the value of column df$name. For the classification I have a two-column data frame tl with a column tl$name and tl$type. I would like to merge the two data frames on a like condition, grepl( tl$name, df$name ), instead of df$name = tl$name.
I have already tried by looping over all rows in df and seeing where there is a match with tl, but this seems very timeconsuming.
E.g.:
df
name
# African elephant
# Indian elephant
# Silverback gorilla
# Nile crocodile
# White shark
tl
name type
# elephant mammal
# gorilla mammal
# crocodile reptile
# shark fish
Upvotes: 3
Views: 848
Reputation: 11
df
name
# African elephant
# Indian elephant
# Silverback gorilla
# Nile crocodile
# White shark
tl
name type
# elephant mammal
# gorilla mammal
# crocodile reptile
# shark fish
I think this is what you want to do
df<-csplit(df, splitcols="name", sep=" ")
The above command will split that column into two columns with name.1 and name.2 column names.
colnames(df)<-c("name","type")
The above command will give proper column names for merging
df_tl<-merge(x=df, y=tl, by="type",all=True)
The above code should give you the desired output.
Upvotes: 0
Reputation: 21641
Another idea:
library(tidyverse)
df %>%
separate(name, into = c("t", "name")) %>%
left_join(tl)
Which gives:
# t name type
#1 African elephant mammal
#2 Indian elephant mammal
#3 Silverback gorilla mammal
#4 Nile crocodile reptile
#5 White shark fish
Upvotes: 1
Reputation: 887841
We can remove the substring with sub
by matching one or more non-white space (\\S+
) followed by one or more white space (\\s+
) from the start (^
) of the string, replace it with blank (""
) and merge
with the second dataset ('tl')
merge(transform(df, name = sub("^\\S+\\s+", "", name)), tl)
# name type
#1 crocodile reptile
#2 elephant mammal
#3 elephant mammal
#4 gorilla mammal
#5 shark fish
If we need to update the first dataset,
df$type <- with(df, tl$type[match(sub("^\\S+\\s+", "", name), tl$name)])
Upvotes: 0