Reputation: 2429
I used R code from a user who answered my previous question (see here: Split values of a column in a data frame by specific tag and add them as extra row) to achieve the following:
df <- data.frame(var1 = c("A", "B", "C", "D", "E"),
var2 = c("test", "5 | 6", "X & Y", "M | N | O", "none"))
> df
var1 var2
1 A test
2 B 5 | 6
3 C X & Y
4 D M | N | O
5 E none
t1 <- c("", "|")[df$var1 %in% df$var1[grep("\\|", df$var2)]+1]
t2 <- c("", "&")[df$var1 %in% df$var1[grep("&", df$var2)]+1]
t1[which(t2 == "&")] <- "&"
df$var3 <- t1
> df
var1 var2 var3
1 A test
2 B 5 | 6 |
3 C X & Y &
4 D M | N | O |
5 E none
I was just wondering if there is a better way of doing it as I really want to improve the way I do my R coding. For me, this wasn't really a simple task to achieve but I am willing to learn :-)
Upvotes: 0
Views: 736
Reputation: 193517
Assuming your data is really this nicely organized, with proper spacing and so on, you can use gsub
along with substring
.
df$var3 = substring(gsub("([a-zA-Z0-9 ])", "", df$var2), 1, 1)
df
# var1 var2 var3
# 1 A test
# 2 B 5 | 6 |
# 3 C X & Y &
# 4 D M | N | O |
# 5 E none
substring
(or substr
) with start
and stop
as 1
.A more general approach, assuming there might be different punctuation marks in var2
, would be:
gsub("[^[:punct:]]", "", df$var2)
# [1] "" "|" "&" "||" ""
Again, using substr
would allow you to select only the first character in each string.
substr(gsub("[^[:punct:]]", "", df$var2), 1, 1)
# [1] "" "|" "&" "|" ""
If you definitely only have those two separating characters, you can change the search pattern from [^[:punct:]]
to [^\\||\\&]
.
In the examples in this update, the ^
(within square brackets) means to match everything but these characters.
Upvotes: 2
Reputation: 2429
I found another solution which works great for me in only one line :-)
library(stringr)
df$var3 <- str_extract(df$var2, "\\||&")
However, I have to say that I do not care about the code adding 's to where no matches are found.
Thanks for all of your solutions though! Great work indeed!
Upvotes: 0
Reputation: 7475
grepl("\\|",df$var2)
grepl("&",df$var2)
is the same as
df$var1 %in% df$var1[grep("\\|", df$var2)]
df$var1 %in% df$var1[grep("&", df$var2)]+1
so you can use for example
ifelse(grepl("\\|",df$var2),'|','')
ifelse(grepl("&",df$var2),'&','')
Upvotes: 0
Reputation: 12411
You can use this instead :
t3 <- rep("",length(df$var1))
t3[which(grepl("&",df$var2))] <- "&"
t3[which(grepl("\\|",df$var2))] <- "|"
df$var3 <- t3
Upvotes: 1