Rick Rothstein
04-26-2012, 10:26 AM
.
*CORRECTED CODE AS OF JULY 31, 2014 (See Message #6 For Details) ****
The InStr function is a handy tool. It will return the character position of text that is embedded within a larger text string. And while this is handy, the function is useless for finding words as stand-alone words in the text. Let's say you had this text string...
TextString = "Don't tell me, but your name is Don, right?"
and you wanted to find the character position of the name Don in that text. If you used the InStr function like this...
MsgBox InStr(TextString, "Don")
It would return the number 1 because it would find the letters "Don" inside the word "Don't" and never look further into the text to see if Don existed as a stand-alone word or not. Even using InStr's optional arguments and having it perform a binary search would not help because the letter casing for the first 3 characters of "Don't" is the same a for the name "Don". Here is a function which can be called from your VB code or which can be used as a UDF on a worksheet, if desired, that will find text as stand-alone words that are not embedded as part of other words...
Function InStrExact(Start As Long, SourceText As String, WordToFind As String, _
Optional CaseSensitive As Boolean = False, _
Optional AllowAccentedCharacters As Boolean = False) As Long
Dim x As Long, Str1 As String, Str2 As String, Pattern As String
Const UpperAccentsOnly As String = "ÇÉÑ"
Const UpperAndLowerAccents As String = "ÇÉÑçéñ"
If CaseSensitive Then
Str1 = SourceText
Str2 = WordToFind
Pattern = "[!A-Za-z0-9]"
If AllowAccentedCharacters Then Pattern = Replace(Pattern, "!", "!" & UpperAndLowerAccents)
Else
Str1 = UCase(SourceText)
Str2 = UCase(WordToFind)
Pattern = "[!A-Z0-9]"
If AllowAccentedCharacters Then Pattern = Replace(Pattern, "!", "!" & UpperAccentsOnly)
End If
For x = Start To Len(Str1) - Len(Str2) + 1
If Mid(" " & Str1 & " ", x, Len(Str2) + 2) Like Pattern & Str2 & Pattern _
And Not Mid(Str1, x) Like Str2 & "'[" & Mid(Pattern, 3) & "*" Then
InStrExact = x
Exit Function
End If
Next
End Function
This function takes three required arguments and provides for two additional optional arguments. The first argument, named Start, is required and lets you set the starting character number for the search to begin at (use 1 to start the search at the first character). The second argument, named SourceText, is required and is the text to look for the word in. The third argument, named WordToFind, is required and, as the name indicates, is the word whose position you want to locate. The fourth argument, named CaseSensitive, is optional and allows you to make the search case sensitive or not (the default is FALSE representing a case insensitive search). The fifth argument, named AllowAccentedCharacters, is optional and allows certain accented letters to be considered as being part of a word (the default is False meaning only normal ASCII characters can make up the word). For example, with the default value of False, the word "resumé" would never be found because the accented "e" (é) would be considered a non-letter. Setting the fifth argument to True would allow the function to find the word "resumé". I only allowed for three accented letters "ç", "é" and "ñ" (in both lower and upper case), but you can add more if you wish by modifying the UpperAccentsOnly and UpperAndLowerAccents constants (the Const statements at the beginning of the code).
HOW TO INSTALL UDFs
------------------------------------
If you are new to UDFs, they are easy to install and use. To install it, simply press ALT+F11 to go into the VB editor and, once there, click Insert/Module on its menu bar, then copy/paste the above code into the code window that just opened up. That's it.... you are done. You can now use InStrExact just like it was a built-in Excel function. For example,
=InStrExact(1,A1,B1)
*CORRECTED CODE AS OF JULY 31, 2014 (See Message #6 For Details) ****
The InStr function is a handy tool. It will return the character position of text that is embedded within a larger text string. And while this is handy, the function is useless for finding words as stand-alone words in the text. Let's say you had this text string...
TextString = "Don't tell me, but your name is Don, right?"
and you wanted to find the character position of the name Don in that text. If you used the InStr function like this...
MsgBox InStr(TextString, "Don")
It would return the number 1 because it would find the letters "Don" inside the word "Don't" and never look further into the text to see if Don existed as a stand-alone word or not. Even using InStr's optional arguments and having it perform a binary search would not help because the letter casing for the first 3 characters of "Don't" is the same a for the name "Don". Here is a function which can be called from your VB code or which can be used as a UDF on a worksheet, if desired, that will find text as stand-alone words that are not embedded as part of other words...
Function InStrExact(Start As Long, SourceText As String, WordToFind As String, _
Optional CaseSensitive As Boolean = False, _
Optional AllowAccentedCharacters As Boolean = False) As Long
Dim x As Long, Str1 As String, Str2 As String, Pattern As String
Const UpperAccentsOnly As String = "ÇÉÑ"
Const UpperAndLowerAccents As String = "ÇÉÑçéñ"
If CaseSensitive Then
Str1 = SourceText
Str2 = WordToFind
Pattern = "[!A-Za-z0-9]"
If AllowAccentedCharacters Then Pattern = Replace(Pattern, "!", "!" & UpperAndLowerAccents)
Else
Str1 = UCase(SourceText)
Str2 = UCase(WordToFind)
Pattern = "[!A-Z0-9]"
If AllowAccentedCharacters Then Pattern = Replace(Pattern, "!", "!" & UpperAccentsOnly)
End If
For x = Start To Len(Str1) - Len(Str2) + 1
If Mid(" " & Str1 & " ", x, Len(Str2) + 2) Like Pattern & Str2 & Pattern _
And Not Mid(Str1, x) Like Str2 & "'[" & Mid(Pattern, 3) & "*" Then
InStrExact = x
Exit Function
End If
Next
End Function
This function takes three required arguments and provides for two additional optional arguments. The first argument, named Start, is required and lets you set the starting character number for the search to begin at (use 1 to start the search at the first character). The second argument, named SourceText, is required and is the text to look for the word in. The third argument, named WordToFind, is required and, as the name indicates, is the word whose position you want to locate. The fourth argument, named CaseSensitive, is optional and allows you to make the search case sensitive or not (the default is FALSE representing a case insensitive search). The fifth argument, named AllowAccentedCharacters, is optional and allows certain accented letters to be considered as being part of a word (the default is False meaning only normal ASCII characters can make up the word). For example, with the default value of False, the word "resumé" would never be found because the accented "e" (é) would be considered a non-letter. Setting the fifth argument to True would allow the function to find the word "resumé". I only allowed for three accented letters "ç", "é" and "ñ" (in both lower and upper case), but you can add more if you wish by modifying the UpperAccentsOnly and UpperAndLowerAccents constants (the Const statements at the beginning of the code).
HOW TO INSTALL UDFs
------------------------------------
If you are new to UDFs, they are easy to install and use. To install it, simply press ALT+F11 to go into the VB editor and, once there, click Insert/Module on its menu bar, then copy/paste the above code into the code window that just opened up. That's it.... you are done. You can now use InStrExact just like it was a built-in Excel function. For example,
=InStrExact(1,A1,B1)