Reputation: 33
I have a list of strings in cells - 1000s of them - and I need to work out the characters per word but separated by word - preferably in 1 swift formula...
For Example: 1. "Black Cup With Handle" > Formula I need > 5,3,4,6
I need this for a recurring task which has been macro'd in a very inefficient way to count words into columns (of which we need to use up to 20 for the just encase) but this needs to be tackled.
Usually, we count the spaces and layer nested serach() formulas to piggyback onto one and other to break down the structure then character counts the individual words...
I could alternatively the macro to substitute the spaces for commas and used text to columns but that still leaves me with a prolonged counting process for what im looking for
we obviously use =LEN(A1)-LEN(SUBSTITUTE(A1," ",""))
to count the spaces in the word
we currently then use =SEACRH()
function combined with =MID()
functions (and some bizarre numbers) to reveal each word into its own individual cell
then =LEN
once again bu on all individual words - very long-winded
Im hoping to find a shorter way to do this but feeling there may not be a dynamic enough way to do it with formula alone, hoping someone can prove me wrong!
Upvotes: 3
Views: 1002
Reputation: 75960
You'll have different options depending on your Excel version.
OPTION 1: TEXTJOIN
I think you are looking for a TEXTJOIN
function. Just bare in mind that you can only use this the more later versions of Excel (see link to documentation) and it could work like this:
Formula in B1
:
=TEXTJOIN(",",TRUE,LEN(FILTERXML("<t><s>"&SUBSTITUTE(A1," ","</s><s>")&"</s></t>","//s")))
NOTE: It's an array formula and you need to enter it using CtrlShiftEnter
To make it so that you won't need to use the above key-combo, we can include an INDEX
:
=TEXTJOIN(",",TRUE,INDEX(LEN(FILTERXML("<t><s>"&SUBSTITUTE(A1," ","</s><s>")&"</s></t>","//s")),))
Additional Information:
This function takes (as per documentation) two required arguments:
Because we want to return an array of elements (words) from the cell, we need to SUBSTITUTE
the spaces for end-tags (</..>
) and concatenate that with a start-tag (<..>
) at the start of the string and another end-tag at the end.
I'll have to rely on an XML
explaination on the tags as to why <?><?>
works and it's meaning, because as far as my testing goes I could swap the letters around or replace by another letter with the same results as long as the final Xpath
would resemble the same character. It would be great if someone would be able to complement this answer with a better explanation on this matter.
For more FILTERXML
"tricks", have a look here
If you are a Office 365 subscriber or own Excel 2019 you can make use of this function. There are (as per documentation) at least 3 required arguments:
TRUE
or FALSE
and determines whether or not you want to exclude/include empty valuesNow this is where we can join the two functions together, FILTERXML
returning an array which we can use in TEXTJOIN
.
I'll have to explain the use of these functions together. I don't think LEN
and INDEX
will need much of an introduction on their own, but together they work quite nicely. Natively there will be a force called implicit intersection that will prevent LEN
from returning an array of values when you pass an array of values to the function, in this case through our FILTERXML
.
Normally you would disable this mechanism using a key combination of: CtrlShiftEnter, better known as CSE
.
Now what INDEX
does is disabling this implicit intersection making LEN
able to return an array, removing the need to CSE
the formula. INDEX
is one of the functions that has this "power". A more in depth explanation on implicit intersection can be found here
OPTION 2: UDF
Without access to TEXTJOIN
I think you'll need to have a look at using an UDF, possibly looking like below:
Function TEXTJOIN(rng As Range) As String
TEXTJOIN = Join(Application.Evaluate("LEN({""" & Join(Split(rng, " "), """,""") & """})"), ",")
End Function
You can call this in B1
like so: =TEXTJOIN(A1)
Additional Information:
The UDF consists out of three main mechanisms that work together:
This funciton takes two parameters, where the first one is required:
The function returns a string value
This function takes a string and delimits it by a specified character/substring. It takes the following arguments:
In this case we would only need the first two arguments.
This is IMO one of the most handy mechanisms you can use to pull of a returned array of values without having to loop through items/cells. It may get slow when you feed the function a large array formula, but in this case it will be fine. The funtion converts a Microsoft Excel name into an object or value, and when we pass it an formula, it thus will return the results. In this particular case it will return an array.
Upvotes: 4
Reputation: 8230
If you want to use a VBA code you could try this:
Option Explicit
Sub test()
Dim arr As Variant
Dim str As String
Dim i As Long, j As Long, LastRow As Long
With ThisWorkbook.Worksheets("Sheet1")
LastRow = .Cells(.Rows.Count, "A").End(xlUp).Row
For i = 1 To LastRow
str = .Range("A" & i).Value
arr = Split(str, " ")
For j = LBound(arr) To UBound(arr)
.Cells(i, j + 3).Value = Len(arr(j))
Next j
Next i
End With
End Sub
Results:
Upvotes: 0
Reputation: 21639
I'm not totally clear on what end-result you're looking for, or whether you're okay with a VBA solution, but this is my interpretation:
Function lengths(txt As String) As String
Dim wrd
For Each wrd In Split(txt)
If lengths <> "" Then lengths = lengths & ","
lengths = lengths & Len(wrd)
Next wrd
End Function
Paste the code into a VBA module and then, for example, if A1
contained Black Cup With Handle
then in another cell you could use =length(A1)
which would return 5,3,4,6
.
Upvotes: 1