Reputation: 69
I have a series that looks like this:
01 1ABCD E 1 4.011 3.952 7.456 -0.3096 1.0132 0.2794
02 1ABCD F 2 4.088 3.920 7.517 0.3839 -0.5482 -1.3874
...
I want to split it into 10 columns based on the length: the first 4 characters including spaces = column 1, the seconds 5 characters = column 2, ..., the last 8 characters = column10
The result should be something like this:
column1 | column2 | column3 | .... | column10 |
---|---|---|---|---|
01 1 | ABCD | E | ..... | 0.2794 |
02 1 | ABCD | F | .... | -1.3874 |
How can I do this in python?
Thanks
Mehrnoosh
Upvotes: 1
Views: 313
Reputation: 30991
An elegant solution is to:
Assuming that s is the source Series, the code to do it is:
import re
# Define size of each group
sizes = [4, 4, 6, 5, 8, 8, 8, 8, 8, 8]
# Generate the pattern string and compile it
pat = re.compile(''.join([ f'(?P<Column{idx}>.{{{n}}})'
for idx, n in enumerate(sizes, start=1) ]))
# Generate the result
result = s.str.extract(pat)
The result is:
Column1 Column2 Column3 Column4 Column5 Column6 Column7 Column8 Column9 Column10
0 01 1 ABCD E 1 4.011 3.952 7.456 -0.3096 1.0132 0.2794
1 02 1 ABCD F 2 4.088 3.920 7.517 0.3839 -0.5482 -1.3874
But note that all columns in result are of object type (actually they are strings). So to perform any sensible processing of them, you should probably:
Upvotes: 3