swifty
swifty

Reputation: 1282

Pandas - Split by Numbers and Letters and save last split

I have a series of strings in the format of:

12345678ABC
12345678ABCDEF
12345A6789AB
12A3456ABC

I would like to split only on the beginning of the trailing letters and output like so:

1  12345678       ABC
2  12345678       ABCDEF
3  12345A6789     AB
4  12A3456        ABC

I tried to do df['ID'].str.split('[a-zA-Z]') with hopes to grab the last -1 split but the output contains no letters. Hoping to complete this in pandas if possible without resorting to re.

Thanks

Upvotes: 1

Views: 1051

Answers (2)

ALollz
ALollz

Reputation: 59579

Use a regular expression with Series.str.extract, where your first capturing group is everything up to the last digit, and then the next capturing group is all of the letters remaining. I've added optional capturing groups so that it works if your string has all numbers or all letters.

s = pd.Series(['12345678ABC', '12345678ABCDEF', '12345A6789AB', 
               '12A3456ABC', '1234123', 'ABCDERED'])

s.str.extract('(?:(.*\d))?(?:([a-zA-Z]+))?')

Output:

            0         1
0    12345678       ABC
1    12345678    ABCDEF
2  12345A6789        AB
3     12A3456       ABC
4     1234123       NaN
5         NaN  ABCDERED

Upvotes: 2

Ishan Srivastava
Ishan Srivastava

Reputation: 1189

# Let A be the array containing strings
# Let nA be the seperated string array
# Let pA be the prefix array
for i in A:
    t = -1
    for index, character in enumerate(i[::-1]):
        if character.isdigit():
            t = index + 1
            break
    nA.append(i[t:])
    pA.append(i[:t])
for index, i in enumerate(A):
    print(pA[index], ' ', nA[index])

Upvotes: 0

Related Questions