Reputation: 43
For example, I have one pandas column contain
text
A1V2
B2C7Z1
I want split it into 26(A-Z) columns with alphabet followed value, if it is missing, then -1.
So, it can be
text A B C D ... Z
A1V2 1 -1 -1 -1 ... -1
B2C7Z1 -1 2 7 -1 ... 1
Is there any fast way rather than using df.apply()?
Followup: Thank Psidom for the brilliant answer. When I use the method run 4 millions rows, it took me 1 hour. I hope there's another way can make it faster. It seems str.extractall() is the most time-consuming one.
Upvotes: 3
Views: 685
Reputation: 214927
Try str.extractall
with regex (?P<key>[A-Z])(?P<value>[0-9]+)
which extracts the key([A-Z]) value([0-9]+) into separate columns and a long to wide transform should get you there.
Here regex (?P<key>[A-Z])(?P<value>[0-9]+)
matches letterDigits pattern and the two capture groups go into two separate columns in the result as columns key and value (with ?P<>
syntax);
And since extractall puts multiple matches into separate rows, you will need to transform it to wide format with unstack
on the key
column:
(df.text.str.extractall("(?P<key>[A-Z])(?P<value>[0-9]+)")
.reset_index('match', drop=True)
.set_index('key', append=True)
.value.unstack('key').fillna(-1))
#key A B C V Z
# 0 1 -1 -1 2 -1
# 1 -1 2 7 -1 1
Upvotes: 5