Reputation: 585
I have a problem I can't seem to understand how to do in R. I'll try and write it in pseudocode.
take contents from one cell of matrix$column
transform() group character i.e. "RA FG+ FZFG BR" using if statement, ie RA = 1, FG+ 3, FZ = 1, FG = 2, BR = 2"
I've tried using library(Stringr) package and the function word(), not sure how I can loop through this.
library(stringr)
strip <- function(data){
for (i in length(data))
word(data, i, sep = " ")
print(data[i])
return(data)
}
strip("RA FG+ FZFG BR")
Cheers for looking
Ps Due to all of your awesome help, I found gsub("\FG", " ", x) worked well for the FZFG pairings.
PPs may not be technically correct but converting factor to as.list() then to numeric gave an interesting result that I'm sticking with for the moment.
Upvotes: 1
Views: 112
Reputation: 483
I have some doubts regarding with the values of the groups of letters, however I came up with a general answer, this would be the code:
Strip <- function(word){
word <- toupper(word)# Just in case
group <- c("RA","FZ","FG","BR")
value <- c("1","1","2","2")
for(i in 1:length(group)){
word <- gsub(group[i],value[i],word)
}
word <- gsub(" ","",word)
word <- strsplit(word,"\\+")[[1]] #"+" is a special character you need to use "\\" for select the exact string.
word <- sum(as.numeric(word))
return(word)
}
Then you can apply
this function in a data frame
by columns just like this:
new_x <- data.frame(matrix(NA,nrow(x),ncol(x)))
new_x[,1:ncol(x)] <- apply(x[,1:ncol(x)],2,Strip)
In case you need other values
for the groups
just change the Strip function
.
Hope this can help you.
Upvotes: 1
Reputation: 2088
ok here is a solution from my understanding of your problem.
Let's say we have a data frame mat
with strings in column colA
mat <- structure(list(colA = c("RA FG+ FZFG BR", " FG+ FZFG BR RA",
"RFA FGP+ FFG BR", "RA FGs1+ FSZFG BR")),
class = "data.frame", .Names = "colA",
row.names = c(NA, -4L))
like this: (first row is your example, so should give 8).
colA
1 RA FG+ FZFG BR
2 FG+ FZFG BR RA
3 RFA FGP+ FFG BR
4 RA FGs1+ FSZFG BR
the function to convert all seeked words to numbers
change_it <- function(x){
x <- gsub("^RA","1", x)
x <- gsub("^FG\\+","3", x)
x <- gsub("^FZFG","2", x)
x <- gsub("^BR","2", x)
x <- gsub("^FG","2", x)
x <- as.numeric(x)
}
the ^ is to make sure we only count separate and first occurence, so that RARA is not replaced by 11 for example, or that DSRA is not replace by DS1
and now apply it (using dplyr):
mat2 <- mat %>%
mutate(colB = strsplit(colA," ")) %>%
mutate(colB = sapply(colB, change_it)) %>%
rowwise %>%
mutate(colC = sum(colB,na.rm = TRUE))
Intermediate result (before rowwise)
colA colB
1 RA FG+ FZFG BR 1, 3, 2, 2
2 FG+ FZFG BR RA NA, 3, 2, 2, 1
3 RFA FGP+ FFG BR NA, NA, NA, 2
4 RA FGs1+ FSZFG BR 1, NA, NA, 2
and the result
colA colB colC
<chr> <list> <dbl>
1 RA FG+ FZFG BR <dbl [4]> 8.00
2 " FG+ FZFG BR RA" <dbl [5]> 8.00
3 RFA FGP+ FFG BR <dbl [4]> 2.00
4 RA FGs1+ FSZFG BR <dbl [4]> 3.00
There will be some NA warnings from the numeric conversion and it wont work if you have isolated numbers like let's say "RA FG 32 FD" because it will convert it to numeric and count it. (would need more filtering against that. You could make explicit tests in change_it instead of gsub to remedy..)
Upvotes: 1
Reputation: 20095
One solution could be using gsubfn
and eval
as:
library(gsubfn)
eval(parse(text =
gsub(" ","+",gsubfn("\\w+\\+?",
list("RA" = 1, "FG+" = 3, "FZFG" = 2, "BR" = 2),
"RA FG+ FZFG BR"))))
#Result:
# 8
One can use matrix$column
in place of hardcoded x
in above expression and replace another column in your dataframe.
matrix$sum <- eval(parse(text =
gsub(" ","+",gsubfn("\\w+\\+?",
list("RA" = 1, "FG+" = 3, "FZFG" = 2, "BR" = 2),
matrix$column))))
Upvotes: 3
Reputation: 548
consider this and apply it on your matrix with for loop:
a <- "ab//*--cd#%@"
Now, I want to replace values as follows: a=1, b=2, c=3, d=4:
b <- gsub("a", "1", a)
b <- gsub("b", "2", b)
b <- gsub("c", "3", b)
b <- gsub("d", "4", b)
cut all the unwanted symbols in the cell:
b <- gsub("\\D", "", b)
make the cell numeric (so we could do math on it):
b <- as.numeric(unlist(strsplit(b, "")))
now it is ready to get a sum:
sum(b)
Upvotes: 1