jgrant
jgrant

Reputation: 1433

Replace a single character in a Numpy list of strings

I have a Numpy array of datetime64 objects that I need to convert to a specific time format yyyy-mm-dd,HH:MM:SS.SSS Numpy has a function called datetime_as_string that outputs ISO8601 (yyyy-mm-ddTHH:MM:SS.SSS) time, which is extremely close to what I want, the only difference being there is a T where I want a comma.

Is there a way to quickly swap the "T" for a ","? Here is an example data set:

offset = np.arange(0, 1000)
epoch = np.datetime64('1970-01-01T00:00:00.000')
time_objects = epoch + offset.astype('timedelta64[ms]')
time_strings = np.datetime_as_string(time_objects)

I have had success using a lambda and a list comprehension, but it seems awkward switching back and forth from a Python list to a Numpy array.

f = lambda x: x[:10] + ',' + x[11:]
np.array([f(x) for x in time_strings])

I know in some cases lambdas can be applied "direct" to a Numpy array, but it doesn't work in this case. f(time_strings) produces a TypeError. Any thoughts?

I know I could convert back to a Python datetime (which is the direction I'm coming from) or use Pandas. But the datetime_as_string function is really fast and I'd like to stick to Numpy solution.

--- Conclusions based on answers ---
It turns out that Paul's view casting black magic was 75x faster than my list comprehension, and 100x faster than np.char.replace(). Here are the results from the three methods (all were initialized with the above dataset, but with 1000000 elements).

start = time.time()
time_strings[..., None].view('U1')[..., 10] = ','
print(time.time() - start)
0.016000747680664062 seconds

start = time.time()
f = lambda x: x[:10] + ',' + x[11:]
time_strings = np.array([f(x) for x in time_strings])
print(time.time() - start, 'seconds')
1.1740672588348389 seconds

start = time.time()
time_strings = np.char.replace(time_strings,'T',',')
print(time.time() - start, 'seconds')
1.4980854988098145 seconds

Upvotes: 1

Views: 301

Answers (2)

hpaulj
hpaulj

Reputation: 231665

In [309]: np.char.replace(time_strings,'T',',')                                 
Out[309]: 
array(['1970-01-01,00:00:00.000', '1970-01-01,00:00:00.001',
       '1970-01-01,00:00:00.002', '1970-01-01,00:00:00.003',
       '1970-01-01,00:00:00.004', '1970-01-01,00:00:00.005',
       '1970-01-01,00:00:00.006', '1970-01-01,00:00:00.007',
       ....

But @PaulPanzer's inplace is much faster (even it is a bit more obscure):

In [316]: %%timeit temp=time_strings.copy() 
     ...: temp[...,None].view('U1')[...,10] = ','                                                                      
8.48 µs ± 34.3 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
In [317]: timeit np.char.replace(time_strings,'T',',')                          
1.23 ms ± 1.12 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

Upvotes: 2

Paul Panzer
Paul Panzer

Reputation: 53099

You could use viewcasting to get access to individual characters:

time_strings[...,None].view('U1')[...,10] = ','

changes time_strings in-place.

Upvotes: 3

Related Questions