BruceWayne
BruceWayne

Reputation: 23283

TypeError: string indices must be integers using pandas apply with lambda

I have a dataframe, one column is a URL, the other is a name. I'm simply trying to add a third column that takes the URL, and creates an HTML link.

The column newsSource has the Link name, and url has the URL. For each row in the dataframe, I want to create a column that has:

<a href="[the url]">[newsSource name]</a>

Trying the below throws the error

File "C:\Users\AwesomeMan\Documents\Python\MISC\News Alerts\simple_news.py", line 254, in df['sourceURL'] = df['url'].apply(lambda x: '{1}'.format(x, x[0]['newsSource']))
TypeError: string indices must be integers

df['sourceURL'] = df['url'].apply(lambda x: '<a href="{0}">{1}</a>'.format(x, x['source']))

But I've used x[colName] before? The below line works fine, it simply creates a column of the source's name:

df['newsSource'] = df['source'].apply(lambda x: x['name'])

Why suddenly ("suddenly" to me) is it saying I can't access the indices?

Upvotes: 4

Views: 3059

Answers (2)

jpp
jpp

Reputation: 164693

pd.Series.apply has access only to a single series, i.e. the series on which you are calling the method. In other words, the function you supply, irrespective of whether it is named or an anonymous lambda, will only have access to df['source'].

To access multiple series by row, you need pd.DataFrame.apply along axis=1:

def return_link(x):
    return '<a href="{0}">{1}</a>'.format(x['url'], x['source'])

df['sourceURL'] = df.apply(return_link, axis=1)

Note there is an overhead associated with passing an entire series in this way; pd.DataFrame.apply is just a thinly veiled, inefficient loop.

You may find a list comprehension more efficient:

df['sourceURL'] = ['<a href="{0}">{1}</a>'.format(i, j) \
                   for i, j in zip(df['url'], df['source'])]

Here's a working demo:

df = pd.DataFrame([['BBC', 'http://www.bbc.o.uk']],
                  columns=['source', 'url'])

def return_link(x):
    return '<a href="{0}">{1}</a>'.format(x['url'], x['source'])

df['sourceURL'] = df.apply(return_link, axis=1)

print(df)

  source                  url                              sourceURL
0    BBC  http://www.bbc.o.uk  <a href="http://www.bbc.o.uk">BBC</a>

Upvotes: 5

BENY
BENY

Reputation: 323306

With zip and string old school string format

df['sourceURL'] = ['<a href="%s.">%s.</a>' % (x,y) for x , y in zip (df['url'], df['source'])]

This is f-string

[f'<a href="{x}">{y}</a>' for x , y in zip ((df['url'], df['source'])]

Upvotes: 2

Related Questions