sciengene
sciengene

Reputation: 5

Removing all words that start with same characters in a cell in excel

I had some issues downloading my data from a server. I have many genes and their expression values merged into one cell rather than split into multiple columns. Some cells have around 30 or so genes and their expression values. I can recover the expression values on the server if I can simply get the gene names out. All the genes start with 'Bn' so I tried this below to remove all words starting in a certain way, but it only works for around 2-3 words starting a certain way:

Extract several words that start with specific char EXCEL

Does anyone know how I can extract all words starting with a certain character so that I can get a list of gene names? Thanks for any help! :)

Upvotes: 0

Views: 328

Answers (1)

JvdV
JvdV

Reputation: 75870

If your data sits in A1 and one has TEXTJOIN, then try:

=TEXTJOIN("; ",1,FILTERXML("<t><s>"&SUBSTITUTE(SUBSTITUTE(A1,";"," ")," ","</s><s>")&"</s></t>","//s[starts-with(.,'Bn')]"))

Confirm through CtrlShiftEnter

enter image description here


Data used:

BnaC01g15960D; --; Protein of unknown function6.00Nan2.680.54 3.392.689.392.703.351.26 2.500.596.888.013.69 BnaC02g00590D; --; AT5G0997618.006.831.704.594.934.648.964.034.161.964.701.7611.7710.639.08 BnaA05920890D; CYP707A4; cytochrome P450

Upvotes: 2

Related Questions