Pandas - Split by Numbers and Letters and save last split

Question

I have a series of strings in the format of:

12345678ABC
12345678ABCDEF
12345A6789AB
12A3456ABC

I would like to split only on the beginning of the trailing letters and output like so:

1  12345678       ABC
2  12345678       ABCDEF
3  12345A6789     AB
4  12A3456        ABC

The preceeding 'number' string can contain some A-Z characters like 3&4.
'number' and 'letter' are of variable length (letter being capped at maximum of 6).

I tried to do df['ID'].str.split('[a-zA-Z]') with hopes to grab the last -1 split but the output contains no letters. Hoping to complete this in pandas if possible without resorting to re.

Thanks

ALollz · Accepted Answer

Use a regular expression with Series.str.extract, where your first capturing group is everything up to the last digit, and then the next capturing group is all of the letters remaining. I've added optional capturing groups so that it works if your string has all numbers or all letters.

s = pd.Series(['12345678ABC', '12345678ABCDEF', '12345A6789AB', 
               '12A3456ABC', '1234123', 'ABCDERED'])

s.str.extract('(?:(.*\d))?(?:([a-zA-Z]+))?')

Output:

            0         1
0    12345678       ABC
1    12345678    ABCDEF
2  12345A6789        AB
3     12A3456       ABC
4     1234123       NaN
5         NaN  ABCDERED

Pandas - Split by Numbers and Letters and save last split

Answers (2)

Related Questions