Reputation: 1521
I'm just starting with Pandas, and Python. I have a CSV dump for the yearly transactions from my bank. Every tax season, I'm required to prepare a report of max values reached during each month (and the specific date), and the max value overall:
Sample data:
df = pd.DataFrame(data={'Date': ['2018-01-01','2018-01-05', '2018-05-01'],
'Transaction': ['CREDIT', 'DEBIT', 'CREDIT'],
'Amount': [100.20, -50.00, 200.00]})
I can't figure out how to use pd.to_datetime on an inline DataFrame.
Tried df['Date'].apply(pd.to_datetime)
but got an error
ValueError: ('Unknown string format:', 'CREDIT', 'occurred at index # Transaction')
df = pd.read_csv("~/Downloads/cheq.csv", parse_dates=[0], na_values="n/a")
df = pd.DataFrame(data, columns=['Date', 'Transaction', 'Amount'])
df.set_index(['Date'], drop=True, inplace=True)
grouped = df.groupby(pd.Grouper(freq="M")) # DataFrameGroupBy (grouped by Month)
for g, v in grouped:
print(g, v.max())
Output:
2018-01-31 00:00:00 Transaction DEBIT
Amount 100.02
dtype: object
2018-02-28 00:00:00 Transaction CREDIT
Amount 200
dtype: object
What I would like to see is (some form of):
2018-01-01 00:00:00 Transaction DEBIT
Amount 100.02
2018-02-01 00:00:00 Transaction CREDIT
Amount 200
Thanks for any help.
Upvotes: 1
Views: 514
Reputation: 16966
You have the convert the date format and then apply groupBy. Try this!
df = pd.DataFrame(data={'Date': ['2018-01-01','2018-01-05', '2018-05-01'],
'Transaction': ['CREDIT', 'DEBIT', 'CREDIT'],
'Amount': [100.20, -50.00, 200.00]})
df['Date'] = pd.to_datetime(df['Date'])
print(df.groupby(df['Date'].dt.strftime('%B')).max())
#output:
Amount Date Transaction
Date
January 100.2 2018-01-05 DEBIT
May 200.0 2018-05-01 CREDIT
Upvotes: 1