Reputation: 9
The following names are in a column. I want to retain just five distinct names, while replace the rest with others. how do I go about that?
df <- data.frame(names = c('Marvel Comics','Dark Horse Comics','DC Comics','NBC - Heroes','Wildstorm',
'Image Comics',NA,'Icon Comics',
'SyFy','Hanna-Barbera','George Lucas','Team Epic TV','South Park',
'HarperCollins','ABC Studios','Universal Studios','Star Trek','IDW Publishing',
'Shueisha','Sony Pictures','J. K. Rowling','Titan Books','Rebellion','Microsoft',
'J. R. R. Tolkien'))
Upvotes: 0
Views: 234
Reputation: 1163
You can also use something like case when. The following code will keep marvel, dark horse, dc comics, JK Rowling and George Lucas the same and change all others to "Other". It functionally the same as u/jpsmith, but (in my humble opinion) offers a little more flexibility because you can change multiple things a bit more easily or make different comics have the same name should you choose to do so.
df = df %>%
mutate(new_names = case_when(names == 'Marvel Comics' ~ 'Marvel Comics',
names == 'Dark Horse Comics' ~ 'Dark Horse Comics',
names == 'DC Comics' ~ 'DC Comics',
names == 'George Lucas' ~ 'George Lucas',
names == 'J. K. Rowling' ~ 'J. K. Rowling',
TRUE ~ "Other"))
Upvotes: 0
Reputation: 17215
If I am understanding you correctly, use %in%
and ifelse
. Here, I chose the first five names as an example. I also created it in a new column, but you could just overwrite the column as well or create a vector:
df <- data.frame(names = c('Marvel Comics','Dark Horse Comics','DC Comics','NBC - Heroes','Wildstorm',
'Image Comics',NA,'Icon Comics',
'SyFy','Hanna-Barbera','George Lucas','Team Epic TV','South Park',
'HarperCollins','ABC Studios','Universal Studios','Star Trek','IDW Publishing',
'Shueisha','Sony Pictures','J. K. Rowling','Titan Books','Rebellion','Microsoft',
'J. R. R. Tolkien'))
fivenamez <- c('Marvel Comics','Dark Horse Comics','DC Comics','NBC - Heroes','Wildstorm')
df$names_transformed <- ifelse(df$names %in% fivenamez, df$names, "Other")
# names names_transformed
# 1 Marvel Comics Marvel Comics
# 2 Dark Horse Comics Dark Horse Comics
# 3 DC Comics DC Comics
# 4 NBC - Heroes NBC - Heroes
# 5 Wildstorm Wildstorm
# 6 Image Comics Other
# 7 <NA> Other
# 8 Icon Comics Other
# 9 SyFy Other
If you want to keep NA
values as NA
, just use df$names_transformed <- ifelse(df$names %in% fivenamez | is.na(df$names), df$names, "Other")
Upvotes: 1