Reputation: 1
I'm currently working with a large dataset based on people's reviews. The dataset includes 25 thousand individuals' reviews of different books, and each row on the text column is unique and has lots of sentences.
My issue is that I'd like to code for whether the text contains any ALL CAPS words. For instance, someone could say: "This book is AMAZING! I loved it!." Normally, excel allows you to directly identify when the cell is entirely written in all caps, but this situation is a bit more unique than that, given that I want to be able to see whether any word within the text is written in all caps. Any suggestions would be appreciated.
I googled possible solutions, and also ask ChatGPT3.5 and GPT4. Nothing that came out gave me an implementable suggestion (GPT was giving codes that were too long for excel to carry out, and the shorter ones it gave were plain wrong)
Upvotes: 0
Views: 178
Reputation: 6749
You can try the following, assuming your input data is in cell A1
and considering also additional conditions indicated in the comment section of the question: 1) Exclude words with length equal to 1, 2) Consider as word delimiter ,
too:
=LET(s,TEXTSPLIT(A1,{" ",","}), f, FILTER(s,(LEN(s) >1)
* EXACT(s, UPPER(s)),""), IF(@f="", FALSE, TRUE))
It can be encapsulated in a user LAMBDA
function Z
, so you can use it for array ranges.
=LET(A, A1:A4, Z, LAMBDA(x, LET(s,TEXTSPLIT(x,{" ",","}),
f, FILTER(s,(LEN(s) >1) * EXACT(s, UPPER(s)),""), IF(@f="", FALSE, TRUE))),
BYROW(A, LAMBDA(x, Z(x))))
Here is the output:
It identifies the word by
or ,
delimiters, then uses the EXACT
function to do case sensitive comparisons.
Keep in mind that if you have AMAZING01
it considers this case as valid too. If you want to exclude it, you need to define the list of allowed characters (lk
) and check that only such characters can appear. For example for upper case letters exclamation sign and punctuation only, you can consider the following:
=LET(lk, VSTACK(CHAR(ROW(65:90)), "!",".", ";"), s,TEXTSPLIT(A1,{" ",","}),
f, FILTER(s,(LEN(s) >1) * EXACT(s, UPPER(s))
* ISERR(TEXTSPLIT(s, lk,,1)),""), IF(@f="", FALSE, TRUE))
You can check here the list of ASCII characters and add the ones you consider in a similar way. You can use CHAR(33)
instead of "!"
for example. It makes sense to use the function in case you can find a range of valid consecutive values from the ASCII list of characters.
The expression: TEXTSPLIT(s, lk,,1)
extracts from s
name on each cell everything that is not a valid character and produce an error of type Empty Array
(#CALC!
) when all characters are valid and this the condition we are looking for.
Upvotes: 1
Reputation: 493
Assuming you can put your data into a Table in Excel you can utilise PowerQuery to do this pretty easily, using some dummy data as shown below.
Select a cell in your table then go to Data => Get &Transform Data => From Table.Range
This will load up the Power Query editor and if you go to Add Column => Custom Column and enter the following it'll add a True/False column that indicates if there's any capital words for each row
Then go to File => Close & Load To...
and choose to load it to a Table on a new or existing sheet and it'll give you the data
The HasCaps code works by splitting the text column at every space " " and creating a list, it then checks if each item in that list is the same when you capitalise it, and then checks if there are any 'trues' in the list.
Upvotes: 1