ant
ant

Reputation: 585

If column has matching string, write new string to new column

Trying to use Column A to isolate the phone brand and then print the brand to a new column Brand.

Original:

   Phone
Samsung note
Samsung note
Nokia lumia
Sony xperia

Desired:

   Phone          Brand
Samsung note 3   Samsung
Samsung note 4   Samsung
Nokia lumia       Nokia
Sony xperia       Sony 

Problem I'm running into is: 1) i don't know how to create a the 'Brand' column with a specific string with the condition that the 'Phone' column has a specific string 2)while doing this for multiple brands and having the 'Brand' column reflect that.

What is the most elegant way to do this? And is there a dplyr method to do this using mutate?

Upvotes: 1

Views: 69

Answers (2)

akrun
akrun

Reputation: 886938

This could be done using base R. We can use sub to remove part of the substring in 'Phone' column. We match one or more space (\\s+) followed by 0 or more characters (.*) until the end ($) of the string and replace it with ''.

df1$Brand <- sub('\\s+.*$', '', df1$Phone)
df1
#         Phone   Brand
#1 Samsung note Samsung
#2 Samsung note Samsung
#3  Nokia lumia   Nokia
#4  Sony xperia    Sony

Or another option is extract from library(tidyr). But, I would use extract only if we need to split a column into multiple columns. In this case, we are keeping the original column and creating only a single new column.

library(tidyr)
extract(df1, Phone, into= 'Brand', '([^ ]+).*', remove=FALSE)
#         Phone   Brand
#1 Samsung note Samsung
#2 Samsung note Samsung
#3  Nokia lumia   Nokia
#4  Sony xperia    Sony

UPDATE: As mentioned in the comments, suppose if we have strings such as 'Samsungnote' or 'Nokialumina', one option would be split/unsplit by a grouping variable created based on the minimum number of characters after the sub step. We use substr to extract the prefix part of the string, split by that, and then remove the suffix in each list element based on the number of characters, and unsplit.

v1 <-  sub('\\s+.*$', '', df2$Phone)
gr <- substr(v1, 1, min(nchar(v1)))
lst <- split(v1, gr)
df2$Brand <- unsplit(lapply(lst, function(x) substr(x, 1, min(nchar(x)))), gr)
df2
#         Phone   Brand
#1 Samsung note Samsung
#2 Samsung note Samsung
#3  Nokia lumia   Nokia
#4  Sony xperia    Sony
#5  Samsungnote Samsung
#6   Nokialumia   Nokia

NOTE: This may not work in all cases.

data

df1 <- structure(list(Phone = c("Samsung note", "Samsung note", 
"Nokia lumia", 
"Sony xperia")), .Names = "Phone", class = "data.frame", 
row.names = c(NA, -4L))

df2 <- structure(list(Phone = c("Samsung note", "Samsung note", 
"Nokia lumia", 
"Sony xperia", "Samsungnote", "Nokialumia")), .Names = "Phone", 
class =  "data.frame", row.names = c(NA, -6L))

Upvotes: 3

Colonel Beauvel
Colonel Beauvel

Reputation: 31161

If you have several elements in each rows of your column phone, you can use cSplit from package splitstackshape:

library(splitstackshape)
cbind(df1, cSplit(df1, 'Phone', sep=' ')[,1, with=F])
#           Phone Phone_1
#1 Samsung note 3 Samsung
#2 Samsung note 4 Samsung
#3    Nokia lumia   Nokia
#4    Sony xperia    Sony

Data:

df1 <- structure(list(Phone = c("Samsung note 3", "Samsung note 4", "Nokia lumia", 
"Sony xperia")), .Names = "Phone", class = "data.frame", row.names = c(NA, -4L))

Upvotes: 2

Related Questions