Reputation: 29
I want just the first 10 characters of each value in the array.
Here is the array:
array(['2018-06-30T00:00:00.000000000', '2018-06-30T00:00:00.000000000',
'2018-06-30T00:00:00.000000000', '2018-06-30T00:00:00.000000000',
'2018-06-30T00:00:00.000000000', '2018-06-30T00:00:00.000000000',
'2018-06-30T00:00:00.000000000', '2018-09-30T00:00:00.000000000']
I would like to write code that will give me this:
array(['2018-06-30','2018-06-30' .... etc
Here's an update: My code is:
x = np.array(df4['per_end_date'])
x
the output is:
array(['2018-06-30T00:00:00.000000000', '2018-06-30T00:00:00.000000000',
'2018-06-30T00:00:00.000000000', '2018-06-30T00:00:00.000000000',
'2018-06-30T00:00:00.000000000', '2018-06-30T00:00:00.000000000',
'2018-06-30T00:00:00.000000000', '2018-09-30T00:00:00.000000000',
'2018-09-30T00:00:00.000000000', '2018-09-30T00:00:00.000000000', etc
I would like just the first 10 characters of each value in the array. The following code give me the error IndexError: invalid index to scalar variable.
x = np.array([y[:9] for y in x])
Upvotes: 1
Views: 650
Reputation: 29
Okay, I figured it out.
df4['per_end_date'].dtype
output:
dtype('<M8[ns]')
So, the following code worked perfectly.
x = np.array(df4['per_end_date'],dtype= 'datetime64[D]')
x
output:
array(['2018-06-30', '2018-06-30', '2018-06-30', '2018-06-30',
'2018-06-30', '2018-06-30', '2018-06-30', '2018-09-30',
'2018-09-30', '2018-09-30', '2018-09-30', '2018-09-30',
'2018-09-30', '2018-09-30', '2018-09-30', '2018-09-30', etc
Great when you can figure it out. :)
Upvotes: 0
Reputation: 51185
Although numpy
isn't always the best way to manipulate strings, you can vectorize this operation, and as always, vectorized functions should be prefered to iteration.
Setup
arr = np.array(['2018-06-30T00:00:00.000000000', '2018-06-30T00:00:00.000000000',
'2018-06-30T00:00:00.000000000', '2018-06-30T00:00:00.000000000',
'2018-06-30T00:00:00.000000000', '2018-06-30T00:00:00.000000000',
'2018-06-30T00:00:00.000000000', '2018-09-30T00:00:00.000000000'],
dtype='<U29')
Using np.frombuffer
np.frombuffer(
arr.view((str, 1 )).reshape(arr.shape[0], -1)[:, :10].tostring(),
dtype=(str,10)
)
array(['2018-06-30', '2018-06-30', '2018-06-30', '2018-06-30',
'2018-06-30', '2018-06-30', '2018-06-30', '2018-09-30'],
dtype='<U10')
Timings
arr = np.repeat(arr, 10000)
%timeit np.array([y[:10] for y in arr])
48.6 ms ± 961 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
%%timeit
np.frombuffer(
arr.view((str, 1 )).reshape(arr.shape[0], -1)[:, :10].tostring(),
dtype=(str,10)
)
6.87 ms ± 311 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
%timeit np.array(arr,dtype= 'datetime64[D]')
44.9 ms ± 2.93 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
Upvotes: 1
Reputation: 1857
It is quite basic task of working with lists in python
import numpy
x = numpy.array(['2018-06-30T00:00:00.000000000', '2018-06-30T00:00:00.000000000',
'2018-06-30T00:00:00.000000000', '2018-06-30T00:00:00.000000000',
'2018-06-30T00:00:00.000000000', '2018-06-30T00:00:00.000000000',
'2018-06-30T00:00:00.000000000', '2018-09-30T00:00:00.000000000'])
numpy.array([y[:10] for y in x])
# array(['2018-06-30', '2018-06-30', '2018-06-30', '2018-06-30',
# '2018-06-30', '2018-09-30'],
# dtype='|S10')
For more information you should read a bit of documentation on list comprehensions.
Upvotes: 0