Reputation: 35
I have a dataset of property prices and they are currently listed by 'DATE_SOLD'. I'd like to be able to count them by year. The dataset looks like this -
SALE_DATE COUNTY SALE_PRICE
0 2010-01-01 Dublin 343000.0
1 2010-01-03 Laois 185000.0
2 2010-01-04 Dublin 438500.0
3 2010-01-04 Meath 400000.0
4 2010-01-04 Kilkenny 160000.0
This is the code I've tried -
by_year = property_prices['SALE_DATE'] = pd.to_datetime(property_prices['SALE_DATE'])
print(by_year)
I think I'm close but as a biblical noob it's quite frustrating!
Thank you for any help you can provide; this site has been awesome so far in finding little tips and tricks to make my life easier
Upvotes: 3
Views: 79
Reputation: 66
import pandas as pd
sample_dict = {'Date':['2010-01-11', '2020-01-22', '2010-03-12'], 'Price':[1000,2000,3500]}
df = pd.DataFrame(sample_dict)
# Creating 'year' column using the Date column
df['year'] = df.apply(lambda row: row.Date.split('-')[0], axis=1)
# Groupby function
df1 = df.groupby('Year')
# Print the first value in each group
df1.first()
Output:
Date x
year
2010 2010-01-11 1
2020 2020-01-22 2
Upvotes: 0
Reputation: 13821
You are close. As you did, you can use pd.to_datetime
to convert your sale_date to a datetime column. Then groupby
the year, using dt.year
which gets the year of the datetime, and use size()
on that which computes the size of each group, which in this case is the year.
property_prices['SALE_DATE'] = pd.to_datetime(property_prices['SALE_DATE'])
property_prices.groupby(property_prices.SALE_DATE.dt.year).size()
Which prints:
SALE_DATE
2010 5
dtype: int64
Upvotes: 1