Steven
Steven

Reputation: 854

Using a for loop to concatenate columns in Pandas

I'm quite new to Python and I'm trying to use Pandas (in iPython Notebook, Python 3) to combine three columns. This is the original data:

       RegistrationID  FirstName  MiddleInitial   LastName    
           1              John       P             Smith    
           2              Bill       Missing       Jones   
           3              Paul       H             Henry  

And I'd like to have:

   RegistrationID FirstName MiddleInitial   LastName    FullName
     1              John       P             Smith   Smith, John, P 
     2              Bill       Missing       Jones   Jones, Bill 
     3              Paul       H             Henry   Henry, Paul, H 

I'm sure this is absolutely not the correct way of doing this, but this is how I have set it up so far in a for loop. Unfortunately, it just keeps going and going and never finishes.

%matplotlib inline
import pandas as pd

from IPython.core.display import HTML
css = open('style-table.css').read() + open('style-notebook.css').read()
HTML('<style>{}</style>'.format(css))

reg = pd.DataFrame.from_csv('regcontact.csv', index_col=RegistrationID)

for item, frame in regcombo['MiddleInitial'].iteritems():
while frame == 'Missing':
   reg['FullName'] = reg.LastName.map(str) + ", " + reg.FirstName 
else: break 

The idea is then to add another column for those with complete names (i.e. including MiddleInitial):

for item, frame in regcombo['MiddleInitial'].iteritems():
while frame != 'Missing':
   reg['FullName1'] = reg.LastName.map(str) + ", " + reg.FirstName + ", " + reg.MiddleInitial
else: break 

And then combine them, so that there are no null values. I've looked everywhere, but I can't quite figure it out. Any help would be appreciated, and I apologize in advance if I have broken any conventions, as this is my first post.

Upvotes: 0

Views: 2830

Answers (2)

Ami Tavory
Ami Tavory

Reputation: 76391

All you need to do is add the columns:

>>> df.FirstName + ', ' + df.LastName + ', ' + df.FullName.str.replace(', Missing', '')
0          John, Smith, P
1    Bill, Jones, Missing
2          Paul, Henry, H
dtype: object

To add a new column, you could just write:

df['FullName'] = df.FirstName + ', ' + ...

(In Pandas, it is usually attempted to avoid loops and such.)

Upvotes: 1

Alexander
Alexander

Reputation: 109746

This uses a list comprehension to create the new dataframe column, e.g. [(a, b, c) for a, b, c in some_iterable_item].

df['Full Name'] = [
   "{0}, {1} {2}"
   .format(last, first, middle if middle != 'Missing' else "").strip() 
   for last, first, middle 
   in df[['LastName', 'FirstName', 'MiddleInitial']].values]

>>> df
   RegistrationID FirstName MiddleInitial LastName      Full Name
0               1      John             P    Smith  Smith, John P
1               2      Bill       Missing    Jones    Jones, Bill
2               3      Paul             H    Henry  Henry, Paul H

The iterable_item is the array of values from the dataframe:

>>> df[['LastName', 'FirstName', 'MiddleInitial']].values
array([['Smith', 'John', 'P'],
       ['Jones', 'Bill', 'Missing'],
       ['Henry', 'Paul', 'H']], dtype=object)

So, per our list comprehension model:

>>> [(a, b, c) for (a, b, c) in df[['LastName', 'FirstName', 'MiddleInitial']].values]
[('Smith', 'John', 'P'), ('Jones', 'Bill', 'Missing'), ('Henry', 'Paul', 'H')]

I then format the string:

a = "Smith"
b = "John"
c = "P"
>>> "{0}, {1} {2}".format(a, b, c)
"Smith, John P"

I use a ternary to check if the middle name is 'Missing', so:

middle if middle != "Missing" else ""

is equivalent to:

if middle == 'Missing':
    middle = ""

Finally, I added .strip() to remove the extra space in case the middle name is missing.

Upvotes: 1

Related Questions