Reputation: 5712
I have a column in my CSV file in the following way
4048.187796
4254.6215672333340-0-0-
4229.9155995666671-0-0-
4427.0494321833340-0-0-
4303.428593050-0-0-
4256.6235064166670-0-0-
4132.5399525833330-0-0-
4263.5820142833341-0-0-
4320.6955591833340-0-0-
4342.1270119333330-0-0-
4447.8283416833340-0-0-
4409.2305202500010-0-0-
4280.650570850-1-0-
4283.5942898166680-0-0-
4341.1896358666670-0-0-
4263.1282187000010-0-0-
4222.3119095333330-0-0-
4314.9844073333331-0-0-
The format of the value is some fraction value + ((1|0)-.*)?
Some lines will only have a fraction value. E,g. the 1st line.
I want to split this into two columns as follows
4048.187796,
4254.621567233334, 0-0-0-
4229.915599566667, 1-0-0-
4427.049432183334, 0-0-0-
4303.42859305, 0-0-0-
4256.623506416667, 0-0-0-
4132.539952583333, 0-0-0-
4263.582014283334, 1-0-0-
4320.695559183334, 0-0-0-
4342.127011933333, 0-0-0-
4447.828341683334, 0-0-0-
4409.230520250001, 0-0-0-
4280.65057085, 0-1-0-
4283.594289816668, 0-0-0-
4341.189635866667, 0-0-0-
4263.128218700001, 0-0-0-
4222.311909533333, 0-0-0-
4314.984407333333, 1-0-0-
I can do this by reading line by line and then manipulate each value by finding the index of '-' and substring that by index - 2. But as I have several files and each file have more than 1000 lines I don't want to do that. Is there a way for me to do this directly using panda and slice functions?
I tried df['new_col'] = df['last'].str.slice
But Ican't give a fix value to slice start index as it changes from row to row
Upvotes: 0
Views: 2089
Reputation: 323226
We can do two steps with contains
and replace
df['New']=np.where(df.Check.str.contains('-'),df.Check.str[-6:],'')
df.Check=df.Check.replace(regex=r'(?i)'+ df['New'],value="")
df
Check New
0 4048.187796
1 4254.621567233334 0-0-0-
2 4229.915599566667 1-0-0-
3 4427.049432183334 0-0-0-
4 4303.42859305 0-0-0-
5 4256.623506416667 0-0-0-
6 4132.539952583333 0-0-0-
7 4263.582014283334 1-0-0-
8 4320.695559183334 0-0-0-
9 4342.127011933333 0-0-0-
10 4447.828341683334 0-0-0-
11 4409.230520250001 0-0-0-
12 4280.65057085 0-1-0-
13 4283.594289816668 0-0-0-
14 4341.189635866667 0-0-0-
15 4263.128218700001 0-0-0-
16 4222.311909533333 0-0-0-
17 4314.984407333333 1-0-0-
Upvotes: 1
Reputation: 214927
Try regular expression ^([0-9.]+)((?:[01]-)*)$
+ str.extract
:
df.last.str.extract('^([0-9.]+)((?:[01]-)*)$')
# 0 1
#0 4048.187796
#1 4254.621567233334 0-0-0-
#2 4229.915599566667 1-0-0-
#3 4427.049432183334 0-0-0-
#4 4303.42859305 0-0-0-
#5 4256.623506416667 0-0-0-
#6 4132.539952583333 0-0-0-
#7 4263.582014283334 1-0-0-
#8 4320.695559183334 0-0-0-
#9 4342.127011933333 0-0-0-
#10 4447.828341683334 0-0-0-
#11 4409.230520250001 0-0-0-
#12 4280.65057085 0-1-0-
#13 4283.594289816668 0-0-0-
#14 4341.189635866667 0-0-0-
#15 4263.128218700001 0-0-0-
#16 4222.311909533333 0-0-0-
#17 4314.984407333333 1-0-0-
Play.
Upvotes: 2