Reputation: 3721
The following operation
import pandas as pd
import numpy as np
data = pd.read_csv(fname,sep=",",quotechar='"')
will create a 650,000 x 9 dataframe. The first column contains dates and the following is designed to turn a single date stamp and turn it into 5 seperate features.
def timepartition(elm):
tm = time.strptime(elm,"%Y-%m-%d %H:%M:%S")
return tm[0], tm[1], tm[2], tm[3], tm[4]
data["Dates"].map(timepartition)
What I would like is to assign those 5 values to a 650,000x7 np matrix.
xtrn = np.zeros(shape=(data.shape[0],7))
xtrn[:,0:4] = np.asarray(data["Dates"].map(timepartition))
#above returns error ValueError: could not broadcast input array from shape (650000) into shape (650000,4)
Upvotes: 0
Views: 1063
Reputation: 605
You might try using some of the builtin pandas
features.
dates = pd.to_datetime(data['Dates'])
date_df = pd.DataFrame(dict(
year=dates.dt.year,
month=dates.dt.month,
day=dates.dt.day,
# etc.
))
xtrn[:, :5] = date_df.values # use date[['year', 'month', 'day', etc.]] if the order comes out wrong
Upvotes: 1
Reputation: 378
The following worked for me. I'm not sure which method is faster, but it was easier for me to understand logically what's going on. Here my dataset "crimes" is your "data" and our time formats are a bit different.
def timepartition(elm):
tm = time.strptime(elm,"%m/%d/%Y %H:%M:%S %p")
return tm[0:5]
zeros = np.zeros(shape=(crimes.shape[0],3), dtype=np.int)
dates = np.array([timepartition(crimes["Date"][i]) for i in range(0,len(crimes))])
new = np.hstack((dates,zeros))
Upvotes: 0
Reputation: 3721
The map function applied to a dataframe is mapping to a new series object, and by returning tuples, it will come back as an object series.
Another approach is the following.
make the following change to timepartition:
def timepartition(elm):
tm = time.strptime(elm,"%Y-%m-%d %H:%M:%S")
return [tm[i] for i in range(5)]
this will now return a listed of a tuple. The following code will create a matrix from a dataframe series that has the desired dimensions, and map it to xtrn
.
xtrn[:,0:5] = = np.matrix(map(timepartition, data["Dates"].tolist()))
np matrix will infer a matrix from the nested lists from applying the partitioning function from the data to a list representation of the series, which is flat in this case.
Upvotes: 0