Reputation: 1321
I have a dataframe that looks like this
df=data.frame(ID=c(1,2,3),hashtag=c('c("#job", "#inclusion<U+0085>", "#driver", "#splitme")','c("#job", "#inclusion<U+0085>", "#driver")','c("#job", "#inclusion<U+0085>")'))
I'd first do some cleaning up, then split column hashtag
into multiple columns based on the number of hashtags in each cell. So for example, the first column has 4 hashtags, hence will be split into four different columns with #job
,#inclusion
,diversity
,splitme
I tried the following
#Clean up
#Remove inverted commas
df$hashtag <- gsub('"', '', df$hashtag)
#Remove brackets
df$hashtag <-gsub("c\\(|\\)", "", df$hashtag)
#Then Split columns
df_split=df%>% separate(hashtag, c("A", "B","C","D"),sep=', ',extra = "drop")
When I try to remove the unicode using the following line of code, nothing happens.
#Remove unicode
df$hashtag <-gsub("\\<|\\>", "", df$hashtag)
Any ideas on what could be the right solution to this?
Upvotes: 1
Views: 838
Reputation: 13591
You didn't specify the output but you can follow this
# vector of hashtag column
v <- df$hashtag
w <- gsub("[#]", "", v)
# [1] "job, inclusion<U+0085>, driver, splitme"
# [2] "job, inclusion<U+0085>, driver"
# [3] "job, inclusion<U+0085>"
ans <- gsub("[<].+[>]", "", w)
# [1] "job, inclusion, driver, splitme" "job, inclusion, driver"
# [3] "job, inclusion"
unlist(strsplit(ans, ","))
# [1] "job" " inclusion" " driver" " splitme" "job"
# [6] " inclusion" " driver" "job" " inclusion"
Upvotes: 1