Using a for loop to concatenate columns in Pandas

Question

I'm quite new to Python and I'm trying to use Pandas (in iPython Notebook, Python 3) to combine three columns. This is the original data:

       RegistrationID  FirstName  MiddleInitial   LastName    
           1              John       P             Smith    
           2              Bill       Missing       Jones   
           3              Paul       H             Henry

And I'd like to have:

   RegistrationID FirstName MiddleInitial   LastName    FullName
     1              John       P             Smith   Smith, John, P 
     2              Bill       Missing       Jones   Jones, Bill 
     3              Paul       H             Henry   Henry, Paul, H

I'm sure this is absolutely not the correct way of doing this, but this is how I have set it up so far in a for loop. Unfortunately, it just keeps going and going and never finishes.

%matplotlib inline
import pandas as pd

from IPython.core.display import HTML
css = open('style-table.css').read() + open('style-notebook.css').read()
HTML(''.format(css))

reg = pd.DataFrame.from_csv('regcontact.csv', index_col=RegistrationID)

for item, frame in regcombo['MiddleInitial'].iteritems():
while frame == 'Missing':
   reg['FullName'] = reg.LastName.map(str) + ", " + reg.FirstName 
else: break

The idea is then to add another column for those with complete names (i.e. including MiddleInitial):

for item, frame in regcombo['MiddleInitial'].iteritems():
while frame != 'Missing':
   reg['FullName1'] = reg.LastName.map(str) + ", " + reg.FirstName + ", " + reg.MiddleInitial
else: break

And then combine them, so that there are no null values. I've looked everywhere, but I can't quite figure it out. Any help would be appreciated, and I apologize in advance if I have broken any conventions, as this is my first post.

Alexander · Accepted Answer

This uses a list comprehension to create the new dataframe column, e.g. [(a, b, c) for a, b, c in some_iterable_item].

df['Full Name'] = [
   "{0}, {1} {2}"
   .format(last, first, middle if middle != 'Missing' else "").strip() 
   for last, first, middle 
   in df[['LastName', 'FirstName', 'MiddleInitial']].values]

>>> df
   RegistrationID FirstName MiddleInitial LastName      Full Name
0               1      John             P    Smith  Smith, John P
1               2      Bill       Missing    Jones    Jones, Bill
2               3      Paul             H    Henry  Henry, Paul H

The iterable_item is the array of values from the dataframe:

>>> df[['LastName', 'FirstName', 'MiddleInitial']].values
array([['Smith', 'John', 'P'],
       ['Jones', 'Bill', 'Missing'],
       ['Henry', 'Paul', 'H']], dtype=object)

So, per our list comprehension model:

>>> [(a, b, c) for (a, b, c) in df[['LastName', 'FirstName', 'MiddleInitial']].values]
[('Smith', 'John', 'P'), ('Jones', 'Bill', 'Missing'), ('Henry', 'Paul', 'H')]

I then format the string:

a = "Smith"
b = "John"
c = "P"
>>> "{0}, {1} {2}".format(a, b, c)
"Smith, John P"

I use a ternary to check if the middle name is 'Missing', so:

middle if middle != "Missing" else ""

is equivalent to:

if middle == 'Missing':
    middle = ""

Finally, I added .strip() to remove the extra space in case the middle name is missing.

Using a for loop to concatenate columns in Pandas

Answers (2)

Related Questions