Reputation: 5
I had some issues downloading my data from a server. I have many genes and their expression values merged into one cell rather than split into multiple columns. Some cells have around 30 or so genes and their expression values. I can recover the expression values on the server if I can simply get the gene names out. All the genes start with 'Bn' so I tried this below to remove all words starting in a certain way, but it only works for around 2-3 words starting a certain way:
Extract several words that start with specific char EXCEL
Does anyone know how I can extract all words starting with a certain character so that I can get a list of gene names? Thanks for any help! :)
Upvotes: 0
Views: 328
Reputation: 75870
If your data sits in A1
and one has TEXTJOIN
, then try:
=TEXTJOIN("; ",1,FILTERXML("<t><s>"&SUBSTITUTE(SUBSTITUTE(A1,";"," ")," ","</s><s>")&"</s></t>","//s[starts-with(.,'Bn')]"))
Confirm through CtrlShiftEnter
Data used:
BnaC01g15960D; --; Protein of unknown function6.00Nan2.680.54 3.392.689.392.703.351.26 2.500.596.888.013.69 BnaC02g00590D; --; AT5G0997618.006.831.704.594.934.648.964.034.161.964.701.7611.7710.639.08 BnaA05920890D; CYP707A4; cytochrome P450
Upvotes: 2