Reputation: 8669
Let say I have a dataset separated into years from 2005-2013
df = pd.read_csv('adj.csv')
with columns "Incident", "Vector", "Year", ...etc
I want the data separated by year and then the same function(pivot) is applied to each data set. Right now to produce a summary I am writing out code like this:
df2005=df[df['Year'] == 2005]
df2006=df[df['Year'] == 2006]
df2007=df[df['Year'] == 2007]
df2008=df[df['Year'] == 2008]
df2009=df[df['Year'] == 2009]
df2010=df[df['Year'] == 2010]
df2011=df[df['Year'] == 2011]
df2012=df[df['Year'] == 2012]
df2013=df[df['Year'] == 2013]
p2005=df2005.pivot_table(columns='Incident', index='Vector',
aggfunc=len,)
p2006=df2006.pivot_table(columns='Incident', index='Vector',
aggfunc=len,)
p2007=df2007.pivot_table(columns='Incident', index='Vector',
aggfunc=len,)
Is there a more efficient way to do this or shorten this process without copying and pasting code df20XX=df[df['Year'] == 20XX], p20XX=df20XX.pivot_table(columns='Incident', index='Vector', aggfunc=len,)
and changing the XX
to the year?
Upvotes: 0
Views: 181
Reputation: 394469
I think something like the following would satisfy your requirements:
# get an array of the unique year values
year_vals = df.Year.unique()
# create our dict
data_dict={}
# iterate for each year make a key and set the value to the pivot table
for year in year_vals:
data_dict[str(year)] = df[df.Year == year].pivot_table(columns='Incident', index='Vector',
aggfunc=len,)
You can then access any specific year in the dict like so:
# this retrieves the pivot table for 2012
data_dict['2012']
Upvotes: 2
Reputation: 60756
Welcome to the world of programming! If you EVER find yourself hard coding 'magic numbers' such as dates in your code, you're probably doing it wrong.
As @EdChum mentioned in the comments, one way to get around hard coded years is by iterating over a list.
Your question gives us no way to reproduce your challenge because you don't provide us any data. So I'll make some up:
np.random.seed(1)
n = 1000
df = pd.DataFrame({'somedata' : np.random.normal(5,2,size=n),
'Year' : np.random.randint(10, size=n)})
print df.head()
Year somedata
0 2 8.248691
1 8 3.776487
2 9 3.943656
3 9 2.854063
4 9 6.730815
you can iterate over each year like this:
for year in df.Year.unique():
print year
The more Pandas-esque way of doing this is by using groupby()
. Here's an example:
print df.groupby('Year').sum()
somedata
Year
0 566.042926
1 464.741896
2 481.299877
3 483.864814
4 448.654140
5 576.779496
6 495.640276
7 597.023765
8 456.119432
9 507.458331
You can write almost any function to work with groupby()
. To learn more about this I highly recommend reading the excellent Pandas Documentation about groupby()
.
Upvotes: 0