Split a series to several columns based on length in Python

Question

I have a series that looks like this:

01 1ABCD     E    1   4.011   3.952   7.456 -0.3096  1.0132  0.2794

02 1ABCD     F    2   4.088   3.920   7.517  0.3839 -0.5482 -1.3874

...

I want to split it into 10 columns based on the length: the first 4 characters including spaces = column 1, the seconds 5 characters = column 2, ..., the last 8 characters = column10

The result should be something like this:

column1	column2	column3	....	column10
01 1	ABCD	E	.....	0.2794
02 1	ABCD	F	....	-1.3874

How can I do this in python?

Thanks

Mehrnoosh

Valdi_Bo · Accepted Answer

An elegant solution is to:

Start with a list of sizes (how many chars should be in each "segment").
Create a (compiled) Regex pattern with named capturing groups, each capturing a stated number of chars.
Use str.extract to extract the required substrings from your Series. Group names will be used as names of output columns.

Assuming that s is the source Series, the code to do it is:

import re

# Define size of each group
sizes = [4, 4, 6, 5, 8, 8, 8, 8, 8, 8]
# Generate the pattern string and compile it
pat = re.compile(''.join([ f'(?P.{{{n}}})'
    for idx, n in enumerate(sizes, start=1) ]))
# Generate the result
result = s.str.extract(pat)

The result is:

  Column1 Column2 Column3 Column4   Column5   Column6   Column7   Column8  Column9  Column10
0    01 1    ABCD       E       1     4.011     3.952     7.456   -0.3096   1.0132    0.2794 
1    02 1    ABCD       F       2     4.088     3.920     7.517    0.3839  -0.5482   -1.3874

But note that all columns in result are of object type (actually they are strings). So to perform any sensible processing of them, you should probably:

strip spaces from each column (both leading and trailing),
convert some columns to either int or float.

Split a series to several columns based on length in Python

Answers (1)

Related Questions