split one pandas column text to multiple columns

Question

For example, I have one pandas column contain

text
A1V2
B2C7Z1

I want split it into 26(A-Z) columns with alphabet followed value, if it is missing, then -1.

So, it can be

text    A  B  C  D ...  Z
A1V2    1  -1 -1 -1 ... -1
B2C7Z1 -1  2  7  -1 ... 1

Is there any fast way rather than using df.apply()?

Followup: Thank Psidom for the brilliant answer. When I use the method run 4 millions rows, it took me 1 hour. I hope there's another way can make it faster. It seems str.extractall() is the most time-consuming one.

akuiper · Accepted Answer

Try str.extractall with regex (?P[A-Z])(?P[0-9]+) which extracts the key([A-Z]) value([0-9]+) into separate columns and a long to wide transform should get you there.

Here regex (?P[A-Z])(?P[0-9]+) matches letterDigits pattern and the two capture groups go into two separate columns in the result as columns key and value (with ?P<> syntax);

And since extractall puts multiple matches into separate rows, you will need to transform it to wide format with unstack on the key column:

(df.text.str.extractall("(?P[A-Z])(?P[0-9]+)")
 .reset_index('match', drop=True)
 .set_index('key', append=True)
 .value.unstack('key').fillna(-1))

#key    A   B   C   V   Z
#  0    1  -1  -1   2  -1
#  1   -1   2   7  -1   1

split one pandas column text to multiple columns

Answers (1)

Related Questions