Reputation: 109
I have a pandas.core.series.Series with data
0 [00115840, 00110005, 001000033, 00116000...
1 [00267285, 00263627, 00267010, 0026513...
2 [00335595, 00350750]
I want to remove leading zeros from the series.I tried
x.astype('int64')
But got error message
ValueError: setting an array element with a sequence.
Can you suggest me how to do this in python 3.x?
Upvotes: 7
Views: 8449
Reputation: 837
If you want a more crisp solution, you could try following: Assuming a is the original series.
b = a.explode().astype(int)
a = b.groupby(b.index).agg(list)
Albeit, this is slower than solutions posted by @cs95 and @jezrael
Upvotes: 1
Reputation: 1329
Below lines should work if you have mixed dtype
df['col'] = df['col'].apply(lambda x:x.lstrip('0') if type(x) == str else x)
Upvotes: 0
Reputation: 862511
If want list of string
s convert to list of integers
s use list comprehension
:
s = pd.Series([[int(y) for y in x] for x in s], index=s.index)
s = s.apply(lambda x: [int(y) for y in x])
Sample:
a = [['00115840', '00110005', '001000033', '00116000'],
['00267285', '00263627', '00267010', '0026513'],
['00335595', '00350750']]
s = pd.Series(a)
print (s)
0 [00115840, 00110005, 001000033, 00116000]
1 [00267285, 00263627, 00267010, 0026513]
2 [00335595, 00350750]
dtype: object
s = s.apply(lambda x: [int(y) for y in x])
print (s)
0 [115840, 110005, 1000033, 116000]
1 [267285, 263627, 267010, 26513]
2 [335595, 350750]
dtype: object
EDIT:
If want integer
s only you can flatten values and cast to int
s:
s = pd.Series([item for sublist in s for item in sublist]).astype(int)
Alternative solution:
import itertools
s = pd.Series(list(itertools.chain(*s))).astype(int)
print (s)
0 115840
1 110005
2 1000033
3 116000
4 267285
5 263627
6 267010
7 26513
8 335595
9 350750
dtype: int32
Timings:
a = [['00115840', '00110005', '001000033', '00116000'],
['00267285', '00263627', '00267010', '0026513'],
['00335595', '00350750']]
s = pd.Series(a)
s = pd.concat([s]*1000).reset_index(drop=True)
In [203]: %timeit pd.Series([[int(y) for y in x] for x in s], index=s.index)
100 loops, best of 3: 4.66 ms per loop
In [204]: %timeit s.apply(lambda x: [int(y) for y in x])
100 loops, best of 3: 5.13 ms per loop
#cᴏʟᴅsᴘᴇᴇᴅ sol
In [205]: %%timeit
...: v = pd.Series(np.concatenate(s.values.tolist()))
...: v.astype(int).groupby(s.index.repeat(s.str.len())).agg(pd.Series.tolist)
...:
1 loop, best of 3: 226 ms per loop
#Wen solution
In [211]: %timeit pd.Series(s.apply(pd.Series).stack().astype(int).groupby(level=0).apply(list))
1 loop, best of 3: 1.12 s per loop
Solutions with flatenning (idea of @cᴏʟᴅsᴘᴇᴇᴅ):
In [208]: %timeit pd.Series([item for sublist in s for item in sublist]).astype(int)
100 loops, best of 3: 2.55 ms per loop
In [209]: %timeit pd.Series(list(itertools.chain(*s))).astype(int)
100 loops, best of 3: 2.2 ms per loop
#cᴏʟᴅsᴘᴇᴇᴅ sol
In [210]: %timeit pd.Series(np.concatenate(s.values.tolist()))
100 loops, best of 3: 7.71 ms per loop
Upvotes: 3
Reputation: 323226
s=pd.Series(s.apply(pd.Series).astype(int).values.tolist())
s
Out[282]:
0 [1, 2]
1 [3, 4]
dtype: object
Data input
s=pd.Series([['001','002'],['003','004']])
Update: Thanks for Jez and cold point it out :-)
pd.Series(s.apply(pd.Series).stack().astype(int).groupby(level=0).apply(list))
Out[317]:
0 [115840, 110005, 1000033, 116000]
1 [267285, 263627, 267010, 26513]
2 [335595, 350750]
dtype: object
Upvotes: 4
Reputation: 402333
Flatten your data with np.concatenate
-
s
0 [00115840, 36869, 262171, 39936]
1 [00267285, 92055, 93704, 11595]
2 [00335595, 119272]
Name: 1, dtype: object
v = pd.Series(np.concatenate(s.tolist()))
Or (thanks to jezrael for the suggestion), using .values.tolist
which is faster -
v = pd.Series(np.concatenate(s.values.tolist()))
v
0 00115840
1 36869
2 262171
3 39936
4 00267285
5 92055
6 93704
7 11595
8 00335595
9 119272
dtype: object
Now, what you're doing with astype
should work -
v.astype(int)
0 115840
1 36869
2 262171
3 39936
4 267285
5 92055
6 93704
7 11595
8 335595
9 119272
dtype: int64
If you have data as floats, use astype(float)
instead.
If you want to, you could reshape the result back to its original format using groupby
+ agg
-
v.astype(int).groupby(s.index.repeat(s.str.len())).agg(pd.Series.tolist)
0 [115840, 36869, 262171, 39936]
1 [267285, 92055, 93704, 11595]
2 [335595, 119272]
dtype: object
Upvotes: 2