If column has matching string, write new string to new column

Question

Trying to use Column A to isolate the phone brand and then print the brand to a new column Brand.

Original:

   Phone
Samsung note
Samsung note
Nokia lumia
Sony xperia

Desired:

   Phone          Brand
Samsung note 3   Samsung
Samsung note 4   Samsung
Nokia lumia       Nokia
Sony xperia       Sony

Problem I'm running into is: 1) i don't know how to create a the 'Brand' column with a specific string with the condition that the 'Phone' column has a specific string 2)while doing this for multiple brands and having the 'Brand' column reflect that.

What is the most elegant way to do this? And is there a dplyr method to do this using mutate?

akrun · Accepted Answer

This could be done using base R. We can use sub to remove part of the substring in 'Phone' column. We match one or more space (\s+) followed by 0 or more characters (.*) until the end ($) of the string and replace it with ''.

df1$Brand <- sub('\s+.*$', '', df1$Phone)
df1
#         Phone   Brand
#1 Samsung note Samsung
#2 Samsung note Samsung
#3  Nokia lumia   Nokia
#4  Sony xperia    Sony

Or another option is extract from library(tidyr). But, I would use extract only if we need to split a column into multiple columns. In this case, we are keeping the original column and creating only a single new column.

library(tidyr)
extract(df1, Phone, into= 'Brand', '([^ ]+).*', remove=FALSE)
#         Phone   Brand
#1 Samsung note Samsung
#2 Samsung note Samsung
#3  Nokia lumia   Nokia
#4  Sony xperia    Sony

UPDATE: As mentioned in the comments, suppose if we have strings such as 'Samsungnote' or 'Nokialumina', one option would be split/unsplit by a grouping variable created based on the minimum number of characters after the sub step. We use substr to extract the prefix part of the string, split by that, and then remove the suffix in each list element based on the number of characters, and unsplit.

v1 <-  sub('\s+.*$', '', df2$Phone)
gr <- substr(v1, 1, min(nchar(v1)))
lst <- split(v1, gr)
df2$Brand <- unsplit(lapply(lst, function(x) substr(x, 1, min(nchar(x)))), gr)
df2
#         Phone   Brand
#1 Samsung note Samsung
#2 Samsung note Samsung
#3  Nokia lumia   Nokia
#4  Sony xperia    Sony
#5  Samsungnote Samsung
#6   Nokialumia   Nokia

NOTE: This may not work in all cases.

data

df1 <- structure(list(Phone = c("Samsung note", "Samsung note", 
"Nokia lumia", 
"Sony xperia")), .Names = "Phone", class = "data.frame", 
row.names = c(NA, -4L))

df2 <- structure(list(Phone = c("Samsung note", "Samsung note", 
"Nokia lumia", 
"Sony xperia", "Samsungnote", "Nokialumia")), .Names = "Phone", 
class =  "data.frame", row.names = c(NA, -6L))

If column has matching string, write new string to new column

Answers (2)

data

Related Questions