Reputation: 65
I try to split and output the csv file. I must use the date to be the file name but don't need the time.
So I want to split the Order_Date, which is a timestamp that has both date and time.
How can I group by a part of value in pandas?
There is my code:
import csv
import re
import pandas as pd
import os
df = pd.read_csv('test.csv',delimiter='|')
for i,x in df.groupby('Order_Date'):
p = os.path.join(r'~/Desktop/',("data_{}.csv").format(i.lower()))
x.to_csv(p,sep = '|', index=False)
Now I can get this:
data_2019-07-23 00:06:00.csv
data_2019-07-23 00:06:50.csv
data_2019-07-23 00:06:55.csv
data_2019-07-28 12:31:00.csv
Example test.csv data:
Channel|Store_ID|Store_Code|Store_Type|Order_ID|Order_Date|Member_ID|Member_Tier|Coupon_ID|Order_Total|Material_No|Material_Name|Size|Quantity|Unit_Price|Line_Total|Discount_Amount
ECOM|ECOM|ECOM|ECOM|A190700|2019-07-23 00:06:00||||1064.00|7564|Full Zip|750|1.00|399.00|168.00|231.00
ECOM|ECOM|ECOM|ECOM|A190700|2019-07-23 00:06:00||||1064.00|1361|COOL TEE|200|1.00|199.00|84.00|115.00
ECOM|ECOM|ECOM|ECOM|A190700|2019-07-23 00:06:00||||1064.00|7699|PANT|690|1.00|499.00|210.00|289.00
ECOM|ECOM|ECOM|ECOM|A190700|2019-07-23 00:06:00||||1064.00|8700|AI DRESS|690|1.00|399.00|196.00|203.00
ECOM|ECOM|ECOM|ECOM|A190700|2019-07-23 00:06:50||||1064.00|8438|COPA|690|1.00|229.00|112.00|117.00
ECOM|ECOM|ECOM|ECOM|A190700|2019-07-23 00:06:55||||1064.00|8324|CLASS|350|1.00|599.00|294.00|305.00
ECOM|ECOM|ECOM|ECOM|A190701|2019-07-28 12:31:00||||798.00|3689|DRESS|500|1.00|699.00|294.00|405.00
Expect I get this:
data_2019-07-23.csv
data_2019-07-28.csv
Any help would be very much appreciated.
Upvotes: 0
Views: 344
Reputation: 36
You can try making the i a string and then using .split() and then using the 0 index:
str(i).split()[0]
so replaced in your code:
for i,x in df.groupby('Order_Date'):
p = os.path.join(r'~/Desktop/',("data_{}.csv").format(str(i).split()[0]))
x.to_csv(p,sep = '|', index=False)
Upvotes: 0
Reputation: 1387
You need to convert Order_Date
to dates - stripping the time information. One quick way to do this is:
df['Order_Date1'] = pd.to_datetime(df['Order_Date']).dt.strftime('%Y-%m-%d')
Then proceed with a groupby using Order_Date1
.
Upvotes: 2