PDA

View Full Version : Extracting Custom Pattern Consisting of Numbers & Text From Alphanumeric String



larkspur
05-20-2013, 04:41 PM
I've number of different alphanumeric strings examples below
1.JET INTERNATIONALINV NO.ABC12539/01 ABC12540/01TCI12538/01
2.CABLE AND WIRELESSINV. 2000680771USA
3.CORPORACION 2000679533 2000679534 2000683140 2000683140 2000683140. SKY
4.SERVICIOS DIRECTOS 2000679523 2000679531 ENE 13 BBC WORLD FEB 13 BBC WORLD MAR 13 BBC WORLD SKY
I need to return the following:
1.ABC2539/01 ABC12540/01 TCI12538/01
2.2000680771
3.2000679533 2000679534 2000683140 2000683140 2000683140
4.2000679523 2000679531

Any suggestions will be much appreciated
Cheers
Larkspur


Hi Guys

Thanks for all the input, much appreciated

Excel Fox
05-20-2013, 06:38 PM
The text strings you want to extract have different arrangement of numeric and text characters. The first bit uses 3 text characters 5 numerics, a back slash and 2 numerics.

The others have 10 numerics. So what's the logic?

larkspur
05-20-2013, 07:05 PM
I'm trying to extract three different types of invoice numbers...
1st Appears with three letters LBC,LPA,SAA etc followed by five numbers then a back slash then two more numbers e.g. 12540/01
2nd Appears with 10 digits starting with 2000 e.g. 2000685684
3rd Appears with 10 digits starting with 1800 e.g. 1800140735

I'll probably need three separate formulas, however if I could combine the three that would be prefect...

Any ideas??


The text strings you want to extract have different arrangement of numeric and text characters. The first bit uses 3 text characters 5 numerics, a back slash and 2 numerics.

The others have 10 numerics. So what's the logic?

Excel Fox
05-20-2013, 07:30 PM
what about the PM you sent me? The last one here has 4 characters (TCIA)


Details appear in a number of different ways e.g.
1.MEDIA VISION DE PA 2000679547,2000679548,2000685031,2000685031,200068 5031 SKY
2.SVERIGESTELEVISIOLBC64745/01AGRREF810130423100000835
3.AVJET INTERNATIONA INV NO.TCIA12539/01 TCA12540/01TCI12538/01
I'd like to be able to return the following..
1. 2000679547 2000679548 2000685031 2000685031 2000685031
2. LBC64745/01
3. TCIA12539/01 TCA12540/01 TCI12538/01

I'd just love one excel formula to return the following results if possible

Can anyone help?
Cheers
Larkspur

larkspur
05-20-2013, 08:08 PM
Please ignore the TCIA, typing error


what about the PM you sent me? The last one here has 4 characters (TCIA)

Rick Rothstein
05-20-2013, 08:41 PM
Please ignore the TCIA, typing error
I'm sure Excel Fox won't mind if I jump in here. Based on your previous descriptions of your invoice numbers and what your data column values look like couple with how you want the data split out, I have a UDF (user defined function) for you to consider (I could not think of anyway to do it with normal Excel functions)...

Function InvoiceNumbers(Cell As Range) As String
' This function assumes output will start one cell to the right of the specified Cell argument
Dim X As Long, Index As Long, S As String, OutText As String
S = Cell.Text
For X = 1 To Len(S)
If Mid(S, X, 11) Like "[A-Za-z][A-Za-z][A-Za-z]#####/##" Then
OutText = OutText & " " & Mid(S, X, 11)
ElseIf Mid(S, X, 10) Like "2000######" Or Mid(S, X, 10) Like "1800######" Then
OutText = OutText & " " & Mid(S, X, 10)
End If
Next
Index = Application.Caller.Column - Cell.Column - 1
If Len(S) Then InvoiceNumbers = Split(Trim(OutText) & Space(Len(S)))(Index)
End Function
Assuming your data starts in cell A1, put this formula...

=InvoiceNumber($A1)

in cell B1 (note the $ sign to freeze the column letter) and copy it across for as many columns as you think you might ever need your data split out into and then copy all those cells down to the bottom of your data.

HOW TO INSTALL UDFs
------------------------------------
If you are new to UDFs, they are easy to install and use. To install it, simply press ALT+F11 to go into the VB editor and, once there, click Insert/Module on its menu bar, then copy/paste the above code into the code window that just opened up. That's it.... you are done. You can now use NameOfTheUDF just like it was a built-in Excel function. See the example formula and instructions above.

larkspur
05-21-2013, 12:25 AM
Works great Thanks for the input Rothstein, much appreciated

snb
05-21-2013, 02:02 AM
or ?


Function snb_002(c00)
snb_002 = Join(Filter(Split(Join(Filter(Split(c00, "."), "0"), "")), "0"))
End Function

Rick Rothstein
05-21-2013, 02:28 AM
or ?


Function snb_002(c00)
snb_002 = Join(Filter(Split(Join(Filter(Split(c00, "."), "0"), "")), "0"))
End Function
Interesting... it never occurred to me that the OP might have wanted the returned value in the same cell (which I'm guessing he didn't given his response to my posting), however, that is a perfectly reasonable reading of the original posting. I don't know if you want to try to fix it or not (given the OP's response to my posting), but your output is incorrect for the 2nd sample data string....

CABLE AND WIRELESSINV. 2000680771USA

as it retains the USA which the OP showed as not desired in his example output. It is not clear if the other date format (the one that starts with three letters) could have unwanted attached letters on the end of it or not, so if you try to post a fix (for the record), you might want to assume yes and design your fix around it as well.

larkspur
05-21-2013, 05:24 PM
very interesting.... thanks snb, however I'd need your function to return the following formats:
1. 3 letters LBC,ABC,FXI etc followed by 5 numerics, a back slash then 2 numerics e.g. 12540/01
2. 2000 followed by 6 numerics
3. 1800 followed by 6 numerics

Rick Rothstein
05-22-2013, 01:04 AM
very interesting.... thanks snb, however I'd need your function to return the following formats:
1. 3 letters LBC,ABC,FXI etc followed by 5 numerics, a back slash then 2 numerics e.g. 12540/01
2. 2000 followed by 6 numerics
3. 1800 followed by 6 numerics
If you are interested in having the output as a comma delimited string within a single cell, like 'snb' assumed you wanted, as opposed to the multi-cell output of my original UDF, you could use the following UDF (note I used the same function name, so you would have to comment out or delete my previous UDF from your module before running this one)...

Function InvoiceNumbers(S As String) As String
Dim X As Long, OutText As String
For X = 1 To Len(S)
If Mid(S, X, 11) Like "[A-Za-z][A-Za-z][A-Za-z]#####/##" Then
OutText = OutText & ", " & Mid(S, X, 11)
ElseIf Mid(S, X, 10) Like "2000######" Or Mid(S, X, 10) Like "1800######" Then
OutText = OutText & ", " & Mid(S, X, 10)
End If
Next
If Len(S) Then InvoiceNumbers = Mid(OutText, 3)
End Function

Note... unlike my previous UDF, this one can be placed anywhere (that is, it does not have to be placed next to the cell it is processing). Also, the calling formula would be the same....

=InvoiceNumbers(A1)

snb
05-22-2013, 02:34 PM
@Rick

You might have misinterpreted my suggestion: it's a UDF; so the result will be the 'filtered' value of another cell: no concatenation, no comma separation etc.

Because I have no idea what the other items look like I can only suggest something for the items provided:


Function snb_002(c00)
snb_002 = Join(Filter(Split(Join(Filter(Split(Replace(c00, "USA", ".USA"), "."), "0"), "")), "0"))
End Function

Rick Rothstein
05-22-2013, 07:13 PM
@Rick

You might have misinterpreted my suggestion: it's a UDF; so the result will be the 'filtered' value of another cell: no concatenation, no comma separation etc.

Because I have no idea what the other items look like I can only suggest something for the items provided:


Function snb_002(c00)
snb_002 = Join(Filter(Split(Join(Filter(Split(Replace(c00, "USA", ".USA"), "."), "0"), "")), "0"))
End Function
Yes, I did misinterpret your code (sorry for not trying it first)... I saw Filter and Join and thought your code was finding the desired "number" parts (no matter what, if anything, was between them) and joining (what I was referring to when I said "concatenating") them together. I see, now, that is not what your code is doing. I do not know if this was a typo on the OP's part or not, but the first data item...

1.JET INTERNATIONALINV NO.ABC12539/01 ABC12540/01TCI12538/01

has two of his "numbers" at the end with no separation between them. Your function preserves that combined pair of "numbers" whereas both of my coded functions separate them... I am guessing adding that functionality to your nicely compact UDF would be difficult to do?

snb
05-23-2013, 12:37 AM
Well it's rather complicated to write concise code for inconsistent data, but in this case:


Function snb_002(c00)
snb_002 = Join(Filter(Split(Replace(Replace(Replace(c00, "USA", " USA"), "TCI", " TCI"), ".", " . ")), "0"))
End Function

larkspur
05-25-2013, 06:55 AM
786

Thanks again snb for your updated UDF, however this extraction is just the first part of what I'm working on. So Rick's original separated field entry works just fine

Next step is to locate those individual numbers on another workbook, then return the corresponding ID number/s and value/s, which are in different cells back to individual cells on this workbook. As I'm new to macros would either of you be able to shed some light on the process (example image attached?)