Pandas: loop through each row, extract features and create new columns

Question

I have a data frame containing one column with different names. I extract features from these names and store them into a dictionary. Then I want to create a column for each feature and store values for each name. I'm struggling to get my loop right.

My code:

import pandas as pd

data = pd.DataFrame(['Mike', 'Ester', 'Sarah'])
data.columns = ['name']

def get_features(name):
    features = {}
    features["firstletter"] = name[0].lower()
    features["lastletter"] = name[-1].lower()
    return features

for name in data['name']:
    features = get_features(name)
    print features
    for f,v in features.items():
        data[f] = v
data.head()

I get:

name    lastletter  firstletter
0   Mike    h   s
1   Ester   h   s
2   Sarah   h   s

I need:

name    lastletter  firstletter
0   Mike    e   m
1   Ester   r   e
2   Sarah   h   s

I understand why all names get values from the last name but I cannot figure out how to fix it. I probably can create new headers for all features first and then update my data frame but I hope there is a smarter way. Will appreciate your help!

EDIT: My feature function is much more complicated than just first/last letter. It contains around 20 different features so I really need to build a dictionary...

def get_features(name):
    features = {}
    features["firstletter"] = name[0].lower()
    features["lastletter"] = name[-1].lower()
    features["hythen"] = ("-" in name.lower())
    features["suffix"] = name[-2:].lower()
    features["prefix"] = name[0:2].lower()
    features["length"] = len(name)
    for letter in 'abcdefghijklmnopqrstuvwxyz':
        features["count(%s)" % letter] = name.lower().count(letter)
        features["has(%s)" % letter] = (letter in name.lower())
    return features

MaxU - stand with Ukraine · Accepted Answer

I'd do it this way:

In [107]: data[['first_letter','last_letter']] = \
              data.name.str.lower().str.extract(r'^(.).*(.)$', expand=True)

In [108]: data
Out[108]:
    name first_letter last_letter
0   Mike            m           e
1  Ester            e           r
2  Sarah            s           h

UPDATE:

In [127]: df.join(pd.DataFrame.from_records(df.apply(lambda x: get_features(x['name']),
                                                     axis=1).values, 
                                            index=df.index))
Out[127]:
    name  count(a)  count(b)  count(c)  count(d)  count(e)  count(f)  \
0   Mike         0         0         0         0         1         0
1  Ester         0         0         0         0         2         0
2  Sarah         2         0         0         0         0         0

   count(g)  count(h)  count(i)   ...    has(v)  has(w)  has(x)  has(y)  \
0         0         0         1   ...     False   False   False   False
1         0         0         0   ...     False   False   False   False
2         0         1         0   ...     False   False   False   False

   has(z)  hythen  lastletter  length  prefix  suffix
0   False   False           e       4      mi      ke
1   False   False           r       5      es      er
2   False   False           h       5      sa      ah

[3 rows x 59 columns]

Pandas: loop through each row, extract features and create new columns

Answers (2)

Related Questions