Reputation: 1186
I want to iterate through the rows in pandas and create a new column based on the value. I have my data set here:
Political Entity Recipient ID Recipient Recipient last name \
0 Candidates 4350 Whelan, Susan Whelan
1 Candidates 4350 Whelan, Susan Whelan
2 Candidates 4350 Whelan, Susan Whelan
3 Candidates 4350 Whelan, Susan Whelan
4 Candidates 15453 Mastroianni, Steve Mastroianni
Recipient first name Recipient middle initial Political Party of Recipient \
0 Susan NaN Liberal Party of Canada
1 Susan NaN Liberal Party of Canada
2 Susan NaN Liberal Party of Canada
3 Susan NaN Liberal Party of Canada
4 Steve NaN Liberal Party of Canada
Electoral District Electoral event Fiscal/Election date \
0 Essex 38th general election 2004-06-28
1 Essex 38th general election 2004-06-28
2 Essex 38th general election 2004-06-28
3 Essex 38th general election 2004-06-28
4 Windsor--Tecumseh 40th general election 2008-10-14
... Monetary amount Non-Monetary amount \
0 ... 800.0 0.0
1 ... 1280.0 0.0
2 ... 250.0 0.0
3 ... 1000.0 0.0
4 ... 800.0 0.0
I want to create a new column where it takes the political party and the year date and add the Monetary value. For example:
+------------------------------+----------------------------+--+--+--+
| 2004 Liberal Party of Canada | 2004 Green Party of Canada | | | |
+------------------------------+----------------------------+--+--+--+
| 8000 | 0 | | | |
+------------------------------+----------------------------+--+--+--+
| | | | | |
+------------------------------+----------------------------+--+--+--+
| | | | | |
+------------------------------+----------------------------+--+--+--+
I have created a couple of functions to help get started:
def year_political_column(row):
return row['Fiscal/Election date'][:4] + ' ' + row['Political Party of Recipient']
def monetary(row):
return row['Monetary amount']
Whenever I look up my solution it seems like you have to already have the column set. Can anyone lead me in the right direction?
Sample output should be:
Political Entity Recipient ID Recipient Recipient last name \
0 Candidates 4350 Whelan, Susan Whelan
1 Candidates 4350 Whelan, Susan Whelan
2 Candidates 4350 Whelan, Susan Whelan
3 Candidates 4350 Whelan, Susan Whelan
4 Candidates 15453 Mastroianni, Steve Mastroianni
Recipient first name Recipient middle initial Political Party of Recipient \
0 Susan NaN Liberal Party of Canada
1 Susan NaN Liberal Party of Canada
2 Susan NaN Liberal Party of Canada
3 Susan NaN Liberal Party of Canada
4 Steve NaN Liberal Party of Canada
Electoral District Electoral event Fiscal/Election date \
0 Essex 38th general election 2004-06-28
1 Essex 38th general election 2004-06-28
2 Essex 38th general election 2004-06-28
3 Essex 38th general election 2004-06-28
4 Windsor--Tecumseh 40th general election 2008-10-14
... Monetary amount Non-Monetary amount \
0 ... 800.0 0.0
1 ... 1280.0 0.0
2 ... 250.0 0.0
3 ... 1000.0 0.0
4 ... 800.0 0.0
Contribution given through Ontario first name Ontario last name \
0 NaN J M
1 NaN J
2 NaN B
3 NaN H
4 NaN H
Ontario Address Ontario city Ontario Province Ontario Postal Code \
0
Ontario Phone #
0
1
2
3
4
With all the political data I am looking for attached on the right.
Upvotes: 0
Views: 1077
Reputation: 3103
This can be accomplished via a variety of ways:
pivot
pivot_table
groupby
However, most of them will need some brushing to output the format you need. Only number 2 will work if you are not looking for an aggregate function and would like the entries.
def column_name(row):
return '{} {}'.format(row['Fiscal/Election date'].year, row['initial Political Party of Recipient'])
df['Fiscal/Election date'] = pd.to_datetime(df['Fiscal/Election date'])
df['Column Name'] = df.apply(column_name, axis=1)
1) pivot_table
In [4]: df[['Column Name', 'Monetary amount']].pivot_table(columns='Column Name'
...: ,
...: values='Monetary amou
...: nt',
...: aggfunc='sum')
...:
Out[4]:
Column Name 2004 Liberal Party of Canada 2008 Liberal Party of Canada
Monetary amount 3330 800
2) pivot
In [5]: (df[['Column Name', 'Monetary amount']]
...: .pivot(columns='Column Name', values='Monetary amount'))
Out[5]:
Column Name 2004 Liberal Party of Canada 2008 Liberal Party of Canada
0 800.0 NaN
1 1280.0 NaN
2 250.0 NaN
3 1000.0 NaN
4 NaN 800.0
3) groupby
In [6]: pd.DataFrame(df.groupby('Column Name')['Monetary amount'].sum()).transpo
...: se()
Out[6]:
Column Name 2004 Liberal Party of Canada 2008 Liberal Party of Canada
Monetary amount 3330 800
Upvotes: 1
Reputation: 1387
Create a column using election year and party name, then do a groupby and transpose:
df['year_political'] = df['Fiscal/Election date'].astype(str).str.slice(0,4) + ' '+ df['Political Party of Recipient']
df.groupby('year_political')['Monetary amount'].sum().reset_index().transpose()
Upvotes: 1