Reputation: 1282
I have a series
of strings in the format of:
12345678ABC
12345678ABCDEF
12345A6789AB
12A3456ABC
I would like to split only on the beginning of the trailing letters and output like so:
1 12345678 ABC
2 12345678 ABCDEF
3 12345A6789 AB
4 12A3456 ABC
I tried to do df['ID'].str.split('[a-zA-Z]')
with hopes to grab the last -1
split but the output contains no letters. Hoping to complete this in pandas if possible without resorting to re
.
Thanks
Upvotes: 1
Views: 1051
Reputation: 59579
Use a regular expression with Series.str.extract
, where your first capturing group is everything up to the last digit, and then the next capturing group is all of the letters remaining. I've added optional capturing groups so that it works if your string has all numbers or all letters.
s = pd.Series(['12345678ABC', '12345678ABCDEF', '12345A6789AB',
'12A3456ABC', '1234123', 'ABCDERED'])
s.str.extract('(?:(.*\d))?(?:([a-zA-Z]+))?')
Output:
0 1
0 12345678 ABC
1 12345678 ABCDEF
2 12345A6789 AB
3 12A3456 ABC
4 1234123 NaN
5 NaN ABCDERED
Upvotes: 2
Reputation: 1189
# Let A be the array containing strings
# Let nA be the seperated string array
# Let pA be the prefix array
for i in A:
t = -1
for index, character in enumerate(i[::-1]):
if character.isdigit():
t = index + 1
break
nA.append(i[t:])
pA.append(i[:t])
for index, i in enumerate(A):
print(pA[index], ' ', nA[index])
Upvotes: 0