Reputation: 854
I'm quite new to Python and I'm trying to use Pandas (in iPython Notebook, Python 3) to combine three columns. This is the original data:
RegistrationID FirstName MiddleInitial LastName
1 John P Smith
2 Bill Missing Jones
3 Paul H Henry
And I'd like to have:
RegistrationID FirstName MiddleInitial LastName FullName 1 John P Smith Smith, John, P 2 Bill Missing Jones Jones, Bill 3 Paul H Henry Henry, Paul, H
I'm sure this is absolutely not the correct way of doing this, but this is how I have set it up so far in a for loop. Unfortunately, it just keeps going and going and never finishes.
%matplotlib inline
import pandas as pd
from IPython.core.display import HTML
css = open('style-table.css').read() + open('style-notebook.css').read()
HTML('<style>{}</style>'.format(css))
reg = pd.DataFrame.from_csv('regcontact.csv', index_col=RegistrationID)
for item, frame in regcombo['MiddleInitial'].iteritems():
while frame == 'Missing':
reg['FullName'] = reg.LastName.map(str) + ", " + reg.FirstName
else: break
The idea is then to add another column for those with complete names (i.e. including MiddleInitial):
for item, frame in regcombo['MiddleInitial'].iteritems():
while frame != 'Missing':
reg['FullName1'] = reg.LastName.map(str) + ", " + reg.FirstName + ", " + reg.MiddleInitial
else: break
And then combine them, so that there are no null values. I've looked everywhere, but I can't quite figure it out. Any help would be appreciated, and I apologize in advance if I have broken any conventions, as this is my first post.
Upvotes: 0
Views: 2830
Reputation: 76391
All you need to do is add the columns:
>>> df.FirstName + ', ' + df.LastName + ', ' + df.FullName.str.replace(', Missing', '')
0 John, Smith, P
1 Bill, Jones, Missing
2 Paul, Henry, H
dtype: object
To add a new column, you could just write:
df['FullName'] = df.FirstName + ', ' + ...
(In Pandas, it is usually attempted to avoid loops and such.)
Upvotes: 1
Reputation: 109746
This uses a list comprehension to create the new dataframe column, e.g. [(a, b, c) for a, b, c in some_iterable_item].
df['Full Name'] = [
"{0}, {1} {2}"
.format(last, first, middle if middle != 'Missing' else "").strip()
for last, first, middle
in df[['LastName', 'FirstName', 'MiddleInitial']].values]
>>> df
RegistrationID FirstName MiddleInitial LastName Full Name
0 1 John P Smith Smith, John P
1 2 Bill Missing Jones Jones, Bill
2 3 Paul H Henry Henry, Paul H
The iterable_item is the array of values from the dataframe:
>>> df[['LastName', 'FirstName', 'MiddleInitial']].values
array([['Smith', 'John', 'P'],
['Jones', 'Bill', 'Missing'],
['Henry', 'Paul', 'H']], dtype=object)
So, per our list comprehension model:
>>> [(a, b, c) for (a, b, c) in df[['LastName', 'FirstName', 'MiddleInitial']].values]
[('Smith', 'John', 'P'), ('Jones', 'Bill', 'Missing'), ('Henry', 'Paul', 'H')]
I then format the string:
a = "Smith"
b = "John"
c = "P"
>>> "{0}, {1} {2}".format(a, b, c)
"Smith, John P"
I use a ternary to check if the middle name is 'Missing', so:
middle if middle != "Missing" else ""
is equivalent to:
if middle == 'Missing':
middle = ""
Finally, I added .strip()
to remove the extra space in case the middle name is missing.
Upvotes: 1