Reputation: 126
I want to convert pandas dataframe to numpy array with the groupby lable with it. In groupby I have to group by using regex so its important to take its lable with it.
My data is in the format:
start_date,is_member
2014-04-15 00:01,1
2014-04-15 00:01,1
2014-04-15 01:01,1
2014-04-15 01:01,1
2014-04-15 02:02,1
2014-04-15 03:05,1
I have tried
df = pd.read_csv(filename, header=0)
df = df.groupby(df.start_date.str.extract("^(.*?)\:", expand=False))[['start_date']].count()[['start_date']]
print(df)
Output of Dataframe is
start_date
2014-04-15 00 2
2014-04-15 01 2
2014-04-15 02 1
2014-04-15 03 1
I have tried it to convert into numpy array with
numpy_array = df.values
Output of numpy array is just the count value
[[2]
[2]
[1]
[1]]
I want it with the startdate as a column.
[[2014-04-15 00 2]
[2014-04-15 01 2]
[2014-04-15 02 1]
[2014-04-15 03 1]]
Upvotes: 3
Views: 150
Reputation: 862771
I believe you need convert index to column by DataFrame.reset_index
:
#simplify code
df = df.groupby(df.start_date.str.extract("^(.*?)\:", expand=False))['start_date'].count()
numpy_array = df.rename_axis('index').reset_index().values
print (numpy_array)
[['2014-04-15 00' 2]
['2014-04-15 01' 2]
['2014-04-15 02' 1]
['2014-04-15 03' 1]]
Or for pandas 0.24+ use:
numpy_array = df.rename_axis('index').reset_index().to_numpy()
Upvotes: 2