Reputation: 481
I have an R dataframe that looks like this:
Gene Symbol Prom 1 Prom 2 Prom 3
1 Gm16088 // Gm16088 7.168819 7.410547 7.634662
2 Gm26206 7.006416 6.824151 6.941721
3 Gm1992 // Gm1992 6.750240 6.591182 6.479798
4 Gm10568 4.390371 4.496734 4.672061
5 Gm22307 13.196217 13.157953 13.601210
6 Gm16041 // Gm16041 5.146015 5.450036 5.388205
7 Gm17101 // Gm17101 6.434086 6.752058 6.603427
In the gene symbol column, I have some gene symbols that are repeated several times inside the same cell of the dataframe. In some lines, the gene symbol is repeated a hundred of times. Is there a way to solve this, in order to have the lines like this:
Gene Symbol Prom 1 Prom 2 Prom 3
1 Gm16088 7.168819 7.410547 7.634662
Instead of having them like this:
Gene Symbol Prom 1 Prom 2 Prom 3
1 Gm16088 // Gm16088 7.168819 7.410547 7.634662
Upvotes: 2
Views: 122
Reputation: 887128
We could also use word
library(stringr)
word(x, 1)
#[1] "Gm16088" "Gm26206"
x <- c("Gm16088 // Gm16088", "Gm26206")
Upvotes: 2
Reputation: 521279
You could try using gsub()
:
x <- "Gm16088 // Gm16088"
> gsub("\\s*//.*", "", x)
[1] "Gm16088"
In your actual code, you would replace x
with:
df$`Gene Symbol`
where df
is the name of the data frame.
Upvotes: 3