AstroFloyd
AstroFloyd

Reputation: 468

How can I convert several NumPy arrays with ints to a NumPy array with formatted strings?

I have three arrays of the same length containing integers: years, months and days. I want to create a (NumPy) array of the same length, containing formatted strings like '(-)yyyy-mm-dd' using the format '%i-%2.2i-%2.2i'.

For the scalar case, I would do something like

year=2000; month=1; day=1
datestr = '%i-%2.2i-%2.2i' % (year, month, day)

which would yield '2000-01-01'.

How can I create the vector version of this, e.g.:

import numpy as np
years  = np.array([-1000, 0, 1000, 2000])
months = np.array([1, 2, 3, 5])
days   = np.array([1, 11, 21, 31])
datestr_array = numpy.somefunction(years, months, days, format='%i-%2.2i-%2.2i', ???)

Note that the date range I am interested in lies between the years -2000 and +3000 (CE), and hence both Python's datetime and Pandas' DateTimeIndex offer no solution.

Upvotes: 0

Views: 115

Answers (2)

hpaulj
hpaulj

Reputation: 231738

A simple list comprehension will be faster than numpy functions:

['%i-%2.2i-%2.2i'%(y,m,d) for y,m,d in zip(years, months,days)]

for a dataframe

arr = df[['year','month','day']].values   # a (n,3) array
['%i-%2.2i-%2.2i'%(y,m,d) for y,m,d in arr]

Adding an arr=arr.tolist() might add some speed, since iteration of an list is faster than on an array.

Upvotes: 1

Larry the Llama
Larry the Llama

Reputation: 1100

Explanation

Let's create a function that will convert any date without bounds to a yyyy-mm-dd string. We can use string formatting, where we create a predefined string and simply format in the relevant data. We also need to format the length to have zeros at the front to 'fill it out', i.e. 2001-05-20.

To be able to run this function, all the respective years months and days must be grouped together, which can be achieved with a zip function, which groups rows between columns as tuples. Preferably, we will convert this to a numpy array.

Now that we have the data in the correct tupled form, let's parse it through our function. We can create a new array that does this using apply, namely numpy.apply_on_axis(func, axis, data). Because the tuples are in the second axis, the axis parameter must be set to 1.

Code

def FormatDate(data):
    # Where data is a tuple for y, m, d
    return "{0:04}-{1:02}-{2:02}".format(data[0], data[1], data[2]) # Note that this formatting can later be update to account for some weirdness

# Convert the data into tuples where y, m, d are aligned in rows
converted = numpy.array(list(zip(years, months, days)))

# Now, lets apply that function to make the tuples all dates
datestr_array = numpy.apply_along_axis(FormatDate, 1, converted)

Upvotes: 2

Related Questions