Reputation: 285

Format numpy array of timestamps into a concatenated string

I have an array of unix timestamps:

d = {'timestamp': [1551675611, 1551676489, 1551676511, 1551676533, 1551676554]}
df = pd.DataFrame(data=d)
timestamps = df[['timestamp']].values

That I would like to format into a concatenated string, like so:

'1551675611;1551676489;1551676511;1551676533;1551676554'

So far I have prepared this:

def format_timestamps(timestamps: np.array) -> str:
    timestamps = ";".join([f"{timestamp:f}" for timestamp in timestamps])
    return timestamps

Running:

format_timestamps(timestamps)

Gives the following error:

TypeError: unsupported format string passed to numpy.ndarray.__format__

Since I'm new to python I'm having trouble understanding how I can fix the error

Upvotes: 0

Answers (4)

JPI93

Reputation: 1557

Why the error?

You're getting this error because of how you extract the 'timestamp' column values with the following line:

timestamps = df[['timestamp']].values

Accessing DataFrame column values passing a list of column names as here will return a multi-dimensional ndarray with the top-level containing ndarray objects containing values for each column name listed for each row in the DataFrame. This approach is generally only useful when selecting multiple columns by name.

The error is being thrown by your function because eachtimestamp here:

";".join([f"{timestamp:f}" for timestamp in timestamps])

Is an ndarray containing a single value when timestamps is defined as in your original post - where a str value would be desirable/expected.

Accounting for the error

To remedy this error in your code, simply replace:

timestamps = df[['timestamp']].values

With:

timestamps = df['timestamp'].values

By passing a single str to extract a single column from your DataFrame, timestamps will here be defined as a one-dimensional ndarray with 'timestamp' column values for each row stored within - which will pass through your original format_timestamps without error.

`format_timestamps`

Running format_timestamps(timestamps) using the above approach and your original implementation of format_timestamps will return:

'1551675611.000000;1551676489.000000;1551676511.000000;1551676533.000000;1551676554.000000'

This is better (no errors at least) but still not quite what you want. This root of this issue is that you are passing f as a format specifier when joining timestamp values, this will format each value as a float when in actuality you want to format each value as an int (format specifier d).

You can either, change your format specifier from f to d in your function definition.

def format_timestamps(timestamps: np.array) -> str:
    timestamps = ";".join([f"{timestamp:d}" for timestamp in timestamps])
    return timestamps

Or simply not pass a format specifier - as timestamps values are already numpy.int64 type.

def format_timestamps(timestamps: np.array) -> str:
    timestamps = ";".join([f"{timestamp}" for timestamp in timestamps])
    return timestamps

Running format_timestamps(timestamps) using either definition above will return what you're after:

'1551675611;1551676489;1551676511;1551676533;1551676554'

Upvotes: 1

cs95

Reputation: 402814

Since you have pandas, why not consider a pandaic solution with str.cat:

df['timestamp'].astype(str).str.cat(sep=';')
# '1551675611;1551676489;1551676511;1551676533;1551676554'

If NaNs or invalid data are a possibility, you can handle them with pd.to_numeric:

(pd.to_numeric(df['timestamp'], errors='coerce')
   .dropna()
   .astype(int)
   .astype(str)
   .str.cat(sep=';'))
# '1551675611;1551676489;1551676511;1551676533;1551676554'

Another idea is to iterate over the list of timestamps and join:

';'.join([f'{t}' for t in  df['timestamp'].tolist()])
# '1551675611;1551676489;1551676511;1551676533;1551676554'

Upvotes: 2

juanpa.arrivillaga

Reputation: 96172

It's because in your list comprehension, timestamp is a numpy.ndarray object. Just flatten first and convert to string:

>>> ";".join(timestamps.flatten().astype(str))
'1551675611;1551676489;1551676511;1551676533;1551676554'

Upvotes: 2

Valentin Macé

Reputation: 1272

A quick fix to your code would be:

def format_timestamps(timestamps: np.array) -> str:
    timestamps = ";".join([f"{timestamp[0]}" for timestamp in timestamps])
    return timestamps

Here I only replaced timestamp:f with timestamp[0], so you get each timestamp as a scalar instead of an array