user969113
user969113

Reputation: 2429

Find specific tag in column of data frame and add tag in separate column to the same data frame

I used R code from a user who answered my previous question (see here: Split values of a column in a data frame by specific tag and add them as extra row) to achieve the following:

df <- data.frame(var1 = c("A", "B", "C", "D", "E"),            

             var2 = c("test", "5 | 6", "X & Y", "M | N | O", "none"))

> df         
  var1          var2
1    A          test
2    B         5 | 6
3    C         X & Y
4    D     M | N | O
5    E          none


t1 <- c("", "|")[df$var1 %in% df$var1[grep("\\|", df$var2)]+1]

t2 <- c("", "&")[df$var1 %in% df$var1[grep("&", df$var2)]+1]

t1[which(t2 == "&")] <- "&"

df$var3 <- t1


> df
  var1          var2     var3
1    A          test     
2    B         5 | 6        |
3    C         X & Y        &
4    D     M | N | O        |
5    E          none     

I was just wondering if there is a better way of doing it as I really want to improve the way I do my R coding. For me, this wasn't really a simple task to achieve but I am willing to learn :-)

Upvotes: 0

Views: 736

Answers (4)

A5C1D2H2I1M1N2O1R2T1
A5C1D2H2I1M1N2O1R2T1

Reputation: 193517

Assuming your data is really this nicely organized, with proper spacing and so on, you can use gsub along with substring.

df$var3 = substring(gsub("([a-zA-Z0-9 ])", "", df$var2), 1, 1)
df
#   var1      var2 var3
# 1    A      test     
# 2    B     5 | 6    |
# 3    C     X & Y    &
# 4    D M | N | O    |
# 5    E      none
  1. For your search pattern, look for all letters and numbers and spaces, and replace them with nothing.
  2. Then, use substring (or substr) with start and stop as 1.

Update

A more general approach, assuming there might be different punctuation marks in var2, would be:

gsub("[^[:punct:]]", "", df$var2)
# [1] ""   "|"  "&"  "||" ""  

Again, using substr would allow you to select only the first character in each string.

substr(gsub("[^[:punct:]]", "", df$var2), 1, 1)
# [1] ""  "|" "&" "|" "" 

If you definitely only have those two separating characters, you can change the search pattern from [^[:punct:]] to [^\\||\\&].

In the examples in this update, the ^ (within square brackets) means to match everything but these characters.

Upvotes: 2

user969113
user969113

Reputation: 2429

I found another solution which works great for me in only one line :-)

library(stringr)

df$var3 <- str_extract(df$var2, "\\||&")

However, I have to say that I do not care about the code adding 's to where no matches are found.

Thanks for all of your solutions though! Great work indeed!

Upvotes: 0

shhhhimhuntingrabbits
shhhhimhuntingrabbits

Reputation: 7475

grepl("\\|",df$var2)
grepl("&",df$var2)

is the same as

df$var1 %in% df$var1[grep("\\|", df$var2)]
df$var1 %in% df$var1[grep("&", df$var2)]+1

so you can use for example

ifelse(grepl("\\|",df$var2),'|','')
ifelse(grepl("&",df$var2),'&','')

Upvotes: 0

Pop
Pop

Reputation: 12411

You can use this instead :

t3 <- rep("",length(df$var1))
t3[which(grepl("&",df$var2))] <- "&"
t3[which(grepl("\\|",df$var2))] <- "|"
df$var3 <- t3

Upvotes: 1

Related Questions