Reputation: 585
Trying to use Column A to isolate the phone brand and then print the brand to a new column Brand.
Original:
Phone
Samsung note
Samsung note
Nokia lumia
Sony xperia
Desired:
Phone Brand
Samsung note 3 Samsung
Samsung note 4 Samsung
Nokia lumia Nokia
Sony xperia Sony
Problem I'm running into is: 1) i don't know how to create a the 'Brand' column with a specific string with the condition that the 'Phone' column has a specific string 2)while doing this for multiple brands and having the 'Brand' column reflect that.
What is the most elegant way to do this? And is there a dplyr method to do this using mutate?
Upvotes: 1
Views: 69
Reputation: 886938
This could be done using base R
. We can use sub
to remove part of the substring in 'Phone' column. We match one or more space (\\s+
) followed by 0 or more characters (.*
) until the end ($
) of the string and replace it with ''
.
df1$Brand <- sub('\\s+.*$', '', df1$Phone)
df1
# Phone Brand
#1 Samsung note Samsung
#2 Samsung note Samsung
#3 Nokia lumia Nokia
#4 Sony xperia Sony
Or another option is extract
from library(tidyr)
. But, I would use extract
only if we need to split a column into multiple columns. In this case, we are keeping the original column and creating only a single new column.
library(tidyr)
extract(df1, Phone, into= 'Brand', '([^ ]+).*', remove=FALSE)
# Phone Brand
#1 Samsung note Samsung
#2 Samsung note Samsung
#3 Nokia lumia Nokia
#4 Sony xperia Sony
UPDATE: As mentioned in the comments, suppose if we have strings such as 'Samsungnote' or 'Nokialumina', one option would be split/unsplit
by a grouping variable created based on the minimum number of characters after the sub
step. We use substr
to extract the prefix part of the string, split
by that, and then remove the suffix in each list
element based on the number of characters, and unsplit
.
v1 <- sub('\\s+.*$', '', df2$Phone)
gr <- substr(v1, 1, min(nchar(v1)))
lst <- split(v1, gr)
df2$Brand <- unsplit(lapply(lst, function(x) substr(x, 1, min(nchar(x)))), gr)
df2
# Phone Brand
#1 Samsung note Samsung
#2 Samsung note Samsung
#3 Nokia lumia Nokia
#4 Sony xperia Sony
#5 Samsungnote Samsung
#6 Nokialumia Nokia
NOTE: This may not work in all cases.
df1 <- structure(list(Phone = c("Samsung note", "Samsung note",
"Nokia lumia",
"Sony xperia")), .Names = "Phone", class = "data.frame",
row.names = c(NA, -4L))
df2 <- structure(list(Phone = c("Samsung note", "Samsung note",
"Nokia lumia",
"Sony xperia", "Samsungnote", "Nokialumia")), .Names = "Phone",
class = "data.frame", row.names = c(NA, -6L))
Upvotes: 3
Reputation: 31161
If you have several elements in each rows of your column phone, you can use cSplit
from package splitstackshape
:
library(splitstackshape)
cbind(df1, cSplit(df1, 'Phone', sep=' ')[,1, with=F])
# Phone Phone_1
#1 Samsung note 3 Samsung
#2 Samsung note 4 Samsung
#3 Nokia lumia Nokia
#4 Sony xperia Sony
Data:
df1 <- structure(list(Phone = c("Samsung note 3", "Samsung note 4", "Nokia lumia",
"Sony xperia")), .Names = "Phone", class = "data.frame", row.names = c(NA, -4L))
Upvotes: 2