Reputation: 382
I am trying to split a csv data containing data with arrays into multiple columns. This works perfectly for most arrays since these are whole numbers, but if I try to split following array (containing dot values) I get a problem.
So here the example. Suppose you have the following array data saved in a column called "Array"
{58.5 |58.5 |58.5 |58.5 |58.5 |58.5 |58.5 |58.5 |58.5 |58.5 |58.5 |58.5}
If I apply the following
```python
splitted_data=raw_data["Array"].str.split("\D",expand=True).add_prefix("setrwds_x")
I get the follwing result
setrwds_x1 setrwds_x2 setrwds_x4 setrwds_x5 setrwds_x7 setrwds_x8 \
0 58 5 58 5 58 5
1 58 5 58 5 58 5
2 58 5 58 5 58 5
3 58 5 58 5 58 5
4 58 5 58 5 58 5
5 58 5 58 5 58 5
6 58 5 58 5 58 5
7 58 5 58 5 58 5
8 58 5 58 5 58 5
9 58 5 58 5 58 5
10 58 5 58 5 58 5
11 58 5 58 5 58 5
12 58 5 58 5 58 5
13 58 5 58 5 58 5
14 58 5 58 5 58 5
15 58 5 58 5 58 5
16 58 5 58 5 58 5
17 58 5 58 5 58 5
It splits the 58.5 into two columns, which is wrong. I need to keep the 58.5.
Do you guys have an advice how to solve the problem?
Upvotes: 0
Views: 106
Reputation: 19307
Try this. \D
in regex stands for non-digit, which includes |
and .
, you would want to explicitly split on |
only. You also need to avoid the first and last bracket using str[1:-1]
raw_data["Array"].str[1:-1].str.split("|",expand=True).add_prefix("setrwds_x")
Tested this out with a dummy series -
#Dummy series
d = ['{58.5 |58.5 |58.5 |58.5 |58.5 |58.5 |58.5 |58.5 |58.5 |58.5 |58.5 |58.5}',
'{58.5 |58.5 |58.5 |58.5 |58.5 |58.5 |58.5 |58.5 |58.5 |58.5 |58.5 |58.5}',
'{58.5 |58.5 |58.5 |58.5 |58.5 |58.5 |58.5 |58.5 |58.5 |58.5 |58.5 |58.5}']
dd = pd.Series(d)
out = dd.str[1:-1].str.split("|",expand=True).add_prefix("setrwds_x")
print(out)
setrwds_x0 setrwds_x1 setrwds_x2 setrwds_x3 setrwds_x4 setrwds_x5 \
0 58.5 58.5 58.5 58.5 58.5 58.5
1 58.5 58.5 58.5 58.5 58.5 58.5
2 58.5 58.5 58.5 58.5 58.5 58.5
setrwds_x6 setrwds_x7 setrwds_x8 setrwds_x9 setrwds_x10 setrwds_x11
0 58.5 58.5 58.5 58.5 58.5 58.5
1 58.5 58.5 58.5 58.5 58.5 58.5
2 58.5 58.5 58.5 58.5 58.5 58.5
Upvotes: 1