Frames Catherine White
Frames Catherine White

Reputation: 28222

Is there a multiple column map function for dataframes?

In Pandas, How can one column be derived from multiple other columns?

For example, lets say I wanted to annotate my dataset with the correct form of address for each subject. Perhaps to label some plots with -- so I can tell who the results are for.

Take a dataset:

data = [('male', 'Homer', 'Simpson'), ('female', 'Marge', 'Simpson'), ('male', 'Bart', 'Simpson'),('female', 'Lisa', 'Simpson'),('infant', 'Maggie', 'Simpson')]
people = pd.DataFrame(data, columns=["gender", "first_name", "last_name"])

So we have:

   gender first_name last_name
0    male      Homer   Simpson
1  female      Marge   Simpson
2    male       Bart   Simpson
3  female       Lisa   Simpson
4  infant     Maggie   Simpson

And a function, which I want to apply to each row, storing the result into a new column.

def get_address(gender, first, last):
    title=""
    if gender=='male':
        title='Mr'
    elif gender=='female':
        title='Ms'

    if title=='':
        return first + ' '+ last
    else:
        return title + ' ' + first[0] + '. ' + last

Currently my method is:

people['address'] = map(lambda row: get_address(*row),people.get_values())



   gender first_name last_name         address
0    male      Homer   Simpson   Mr H. Simpson
1  female      Marge   Simpson   Ms M. Simpson
2    male       Bart   Simpson   Mr B. Simpson
3  female       Lisa   Simpson   Ms L. Simpson
4  infant     Maggie   Simpson  Maggie Simpson

Which works, but it is not elegant. It also feels bad converting to a unindexed list, then assigning back into a indexed column.

Upvotes: 0

Views: 948

Answers (2)

ZJS
ZJS

Reputation: 4051

What you are looking for is apply(func,axis=1) This will apply a function row wise through your dataframe.

In your example modify your method get_address to...

def get_address(row):#row is a pandas series with col names as indexes
    title=""
    gender = row['gender']     #extract gender from pandas series
    first = row['first_name']  #extract firstname from pandas series
    second = row['last_name']  #extract lastname from pandas series

    if gender=='male':
        title='Mr'
    elif gender=='female':
        title='Ms'

    if title=='':
        return first + ' '+ last
    else:
        return title + ' ' + first[0] + '. ' + last

then call people.apply(get_address,axis=1) which returns a new column (Actually this is a pandas series, with the correct indexes, which is how the dataframe knows how to add it as a column correctly) to add it to your dataframe add this code...

people['address'] = people.apply(get_address,axis=1)

Upvotes: 2

Phillip Cloud
Phillip Cloud

Reputation: 25672

You can do this without any explicit looping:

In [70]: df
Out[70]:
   gender first_name last_name
0    male      Homer   Simpson
1  female      Marge   Simpson
2    male       Bart   Simpson
3  female       Lisa   Simpson
4  infant     Maggie   Simpson

In [71]: title = df.gender.replace({'male': 'Mr', 'female': 'Ms', 'infant': ''})

In [72]: initial = np.where(df.gender != 'infant', df.first_name.str[0] + '. ', df.first_name + ' ')
In [73]: initial
Out[73]: array(['H. ', 'M. ', 'B. ', 'L. ', 'Maggie '], dtype=object)

In [74]: address = (title + ' ' + Series(initial) + df.last_name).str.strip()

In [75]: address
Out[75]:
0     Mr H. Simpson
1     Ms M. Simpson
2     Mr B. Simpson
3     Ms L. Simpson
4    Maggie Simpson
dtype: object

Check out the documentation for Series.str methods, they're pretty rad. Most methods from str are implemented in addition to goodies like extract.

Upvotes: 1

Related Questions