Lewis Menelaws
Lewis Menelaws

Reputation: 1186

How to create a new column based on row value in pandas

I want to iterate through the rows in pandas and create a new column based on the value. I have my data set here:

  Political Entity  Recipient ID           Recipient Recipient last name  \
0       Candidates          4350       Whelan, Susan              Whelan   
1       Candidates          4350       Whelan, Susan              Whelan   
2       Candidates          4350       Whelan, Susan              Whelan   
3       Candidates          4350       Whelan, Susan              Whelan   
4       Candidates         15453  Mastroianni, Steve         Mastroianni   

  Recipient first name Recipient middle initial Political Party of Recipient  \
0                Susan                      NaN      Liberal Party of Canada   
1                Susan                      NaN      Liberal Party of Canada   
2                Susan                      NaN      Liberal Party of Canada   
3                Susan                      NaN      Liberal Party of Canada   
4                Steve                      NaN      Liberal Party of Canada   

  Electoral District        Electoral event Fiscal/Election date  \
0              Essex  38th general election           2004-06-28   
1              Essex  38th general election           2004-06-28   
2              Essex  38th general election           2004-06-28   
3              Essex  38th general election           2004-06-28   
4  Windsor--Tecumseh  40th general election           2008-10-14   

        ...       Monetary amount Non-Monetary amount  \
0       ...                 800.0                 0.0   
1       ...                1280.0                 0.0   
2       ...                 250.0                 0.0   
3       ...                1000.0                 0.0   
4       ...                 800.0                 0.0   

I want to create a new column where it takes the political party and the year date and add the Monetary value. For example:

+------------------------------+----------------------------+--+--+--+
| 2004 Liberal Party of Canada | 2004 Green Party of Canada |  |  |  |
+------------------------------+----------------------------+--+--+--+
| 8000                         | 0                          |  |  |  |
+------------------------------+----------------------------+--+--+--+
|                              |                            |  |  |  |
+------------------------------+----------------------------+--+--+--+
|                              |                            |  |  |  |
+------------------------------+----------------------------+--+--+--+

I have created a couple of functions to help get started:

def year_political_column(row):
    return row['Fiscal/Election date'][:4] + ' ' + row['Political Party of Recipient']


def monetary(row):
    return row['Monetary amount']

Whenever I look up my solution it seems like you have to already have the column set. Can anyone lead me in the right direction?

Sample output should be:

  Political Entity  Recipient ID           Recipient Recipient last name  \
0       Candidates          4350       Whelan, Susan              Whelan   
1       Candidates          4350       Whelan, Susan              Whelan   
2       Candidates          4350       Whelan, Susan              Whelan   
3       Candidates          4350       Whelan, Susan              Whelan   
4       Candidates         15453  Mastroianni, Steve         Mastroianni   

  Recipient first name Recipient middle initial Political Party of Recipient  \
0                Susan                      NaN      Liberal Party of Canada   
1                Susan                      NaN      Liberal Party of Canada   
2                Susan                      NaN      Liberal Party of Canada   
3                Susan                      NaN      Liberal Party of Canada   
4                Steve                      NaN      Liberal Party of Canada   

  Electoral District        Electoral event Fiscal/Election date  \
0              Essex  38th general election           2004-06-28   
1              Essex  38th general election           2004-06-28   
2              Essex  38th general election           2004-06-28   
3              Essex  38th general election           2004-06-28   
4  Windsor--Tecumseh  40th general election           2008-10-14   

        ...       Monetary amount Non-Monetary amount  \
0       ...                 800.0                 0.0   
1       ...                1280.0                 0.0   
2       ...                 250.0                 0.0   
3       ...                1000.0                 0.0   
4       ...                 800.0                 0.0   

  Contribution given through Ontario first name Ontario last name  \
0                        NaN                J M            
1                        NaN                  J             
2                        NaN                  B            
3                        NaN                  H            
4                        NaN                  H            

   Ontario Address Ontario city Ontario Province Ontario Postal Code  \
0                

  Ontario Phone #  
0      
1      
2      
3      
4      

With all the political data I am looking for attached on the right.

Upvotes: 0

Views: 1077

Answers (2)

iDrwish
iDrwish

Reputation: 3103

This can be accomplished via a variety of ways:

  • pivot
  • pivot_table
  • groupby

However, most of them will need some brushing to output the format you need. Only number 2 will work if you are not looking for an aggregate function and would like the entries.

def column_name(row):
    return '{} {}'.format(row['Fiscal/Election date'].year, row['initial Political Party of Recipient'])

df['Fiscal/Election date'] = pd.to_datetime(df['Fiscal/Election date'])

df['Column Name'] = df.apply(column_name, axis=1)

1) pivot_table

In [4]: df[['Column Name', 'Monetary amount']].pivot_table(columns='Column Name'
   ...: , 
   ...:                                                    values='Monetary amou
   ...: nt', 
   ...:                                                    aggfunc='sum')
   ...:                                                    
Out[4]: 
Column Name      2004 Liberal Party of Canada  2008 Liberal Party of Canada
Monetary amount                          3330                           800

2) pivot

In [5]: (df[['Column Name', 'Monetary amount']]
   ...: .pivot(columns='Column Name', values='Monetary amount'))
Out[5]: 
Column Name  2004 Liberal Party of Canada  2008 Liberal Party of Canada
0                                   800.0                           NaN
1                                  1280.0                           NaN
2                                   250.0                           NaN
3                                  1000.0                           NaN
4                                     NaN                         800.0

3) groupby

In [6]: pd.DataFrame(df.groupby('Column Name')['Monetary amount'].sum()).transpo
   ...: se()
Out[6]: 
Column Name      2004 Liberal Party of Canada  2008 Liberal Party of Canada
Monetary amount                          3330                           800

Upvotes: 1

xyzjayne
xyzjayne

Reputation: 1387

Create a column using election year and party name, then do a groupby and transpose:

df['year_political'] = df['Fiscal/Election date'].astype(str).str.slice(0,4) + ' '+ df['Political Party of Recipient']
df.groupby('year_political')['Monetary amount'].sum().reset_index().transpose()

Upvotes: 1

Related Questions