PDA

View Full Version : FindWord - Find A (Possibly Listed) Word, As A Word, Not Embedded Within Another Word



Rick Rothstein
10-12-2012, 01:30 AM
NOTE: Minor addition (highlighted in red) to the code made on October 16, 2012

The inspiration for this article came from this blog article by Debra Dalgleish ("Find Text With INDEX and MATCH (http://blog.contextures.com/archives/2012/10/09/find-text-with-index-and-match/)") that I found on her Contextures Blog (http://blog.contextures.com/). In that article, Debra shows a formula that identifies the location of any one word in a "list" of words from within a larger piece of text. This seemed to be a natural extension to my "InStrExact - Find Location Of A Word, As A Word, Not Embedded Within Another Word (http://www.excelfox.com/forum/f22/instrexact-find-location-word-word-not-embedded-within-another-word-390/)" article posted elsewhere here in Rick Rothstein's Corner (http://www.excelfox.com/forum/f22/). The resulting function below (which can be used as a UDF) actually incorporates the functionality of my InStrExact function completely except that I changed the way delimiters for words are handled. In the function below, the word break characters are specified directly with the user having the ability to extend the list as needed (unlike the InStrExact function which employed a more complex method to determine which characters could serve as a word break). First, here is the function, which I call FindWord...


Function FindWord(Start As Long, SourceText As String, ByVal WordsToFind As Variant, _
Optional CaseSensitive As Boolean = False, _
Optional AdditionalWordBreaks As String) As Long
Dim x As Long, Str1 As String, Pattern As String, WordBreaks As String, Word As Variant
WordBreaks = "[&"" '(),./:;<>?[\_`{|}~©®«»­´¶·¿!" & Chr$(160) & "-]"
For x = 1 To Len(AdditionalWordBreaks)
If InStr(WordBreaks, Mid(AdditionalWordBreaks, x, 1)) Then Mid(AdditionalWordBreaks, x, 1) = " "
Next
WordBreaks = Replace(WordBreaks, "-]", Replace(AdditionalWordBreaks, " ", "") & "-]")
If CaseSensitive Then
Str1 = SourceText
Else
Str1 = UCase(SourceText)
End If
If TypeName(WordsToFind) = "String" Then WordsToFind = Split(WordsToFind, Chr$(1))
For Each Word In WordsToFind
If Not CaseSensitive Then Word = UCase(Word)
For x = Start To Len(Str1) - Len(Word) + 1
If Mid(" " & Str1 & " ", x, Len(Word) + 2) Like WordBreaks & Word & WordBreaks Or _
Mid(" " & Str1 & " ", x, Len(Word) + 2) Like WordBreaks & Word & "[]]" Then
FindWord = x
Exit Function
End If
Next
Next
End Function

This function takes three required arguments and provides for two additional optional arguments. The first argument, named Start, is required and lets you set the starting character number for the search to begin at (use 1 to start the search at the first character). The second argument, named SourceText, is required and is the text to look for the word in. The third argument, named WordsToFind, is required and is word or list of words whose position you want to locate (more about this argument in a moment). The fourth argument, named CaseSensitive, is optional and allows you to make the search case sensitive or not (the default is FALSE representing a case insensitive search). The fifth argument, named AdditionalWordBreaks, allows you to include additional characters (one or more characters) as word break characters over and above the characters I hard-coded in the function. And those characters are...

[ Left square bracket
] Right square bracket
& Ampersand
" Quote mark
' Apostrophe
( Left Parenthesis
) Right Parenthesis
, Comma
. Dot
/ Slash
: Colon
; Semi-Colon
< Left Angle Bracket
> Right Angle Bracket
? Question Mark
\ Back Slash
_ Underline
` Grave Accent
{ Left Curly Brace
} Right Curly Brace
| Vertical Line
~ Tilde
© Copyright Symbol
® Registered Symbol
« Left Double Angle Bracket
» Right Double Angle Bracket
­
¶ Pilcrow Sign
· Middle Dot
¿ Spanish Question Mark
! Exclamation mark
- Dash
Space
Non-Breaking Space

If you have one or more characters that is not listed above that you would like to be used as a word break character, simply pass them in as a single text string for this fifth argument. For example, let's say you wanted to find the location for the number 12.34 within this text string...

"The value was 12.34% or thereabouts."

Because the percent sign is not defined as a word break character, you cannot simply look for the number 12.34 like this...


Position = FindWord(1, "The value was 12.34% or thereabouts.", "12.34")

rather, you have to tell the function the percent sign is to be considered a word break character like this...


Position = FindWord(1, "The value was 12.34% or thereabouts.", "12.34",,"%")


Okay, now about that third argument (WordsToFind)... this argument can be a single word, an array of words (either using a declared array or an array produced from the Array or Split function when called from other VB code or using an array of quoted words encased in curly braces {} from a worksheet formula) or a specified range when called from a worksheet formula. So, all of the following are valid ways to specify this...

From VB Code
---------------------------
"SomeWord"
VariableContainingWord (either simple String variable or Array variable)
Array("WordOne", "WordTwo", "WordThree", "Etc.")
Split("WordOne,WordTwo,WordThree,Etc.", ",")

From Excel Formula
--------------------------
"SomeWord"
{"WordOne","WordTwo","WordThree","Etc."}
A1
A1:A5

So, using the array constant argument in the above "From Excel Formula" section, you could specify this formula in a cell...

=FindWord(1,"This contains WordTwo in it",{"WordOne","WordTwo","WordThree","Etc."})

and it would return 15 as a result because one of the words in the list (WordTwo) can be found at character position 15 in the larger text string passed into the second argument.

One final note... while the emphasis has been on single words, word or words specified for the WordsToFind argument do not have to be single words... they can be multi-word phrases possibly containing characters that would otherwise be condered word break characters. So, you could have a text string like this...

"Her mother-in-law came to visit."

and you could use the FindWord function to locate "mother-in-law" even though the dash is considered to be a word break character.

HOW TO INSTALL UDFs
------------------------------------
If you are new to UDFs, they are easy to install and use. To install it, simply press ALT+F11 to go into the VB editor and, once there, click Insert/Module on its menu bar, then copy/paste the above code into the code window that just opened up. That's it.... you are done. You can now use NameOfTheUDF just like it was a built-in Excel function. For example, assuming A1 contains the text string and B1 contains a word to find...

=FindWord(1,A1,B1)

Rick Rothstein
10-16-2012, 01:53 PM
Posting this message so that it will alert anyone subscribed to this article... see the note at the beginning of Message #1.