Christopher
Christopher

Reputation: 2232

Pandas: Applying function to rows, writing into new columns

Applying functions to dataframe

I currently have following dataframe:

Data

url                            visitors
http://somedomain.com          200000
http://someotherdomain.com     150000
http://somenewdomain.com       11000

For every row in the dataframe, I like to apply two functions to the url column and then write each result in two distinct columns 'meta' and 'content'.

Functions:

def metacrawler(url)
    ...
    return data

def contentcrawler(url)
    ...
    return data

# Counter
progress = 0

Loop

for index, row in data.iterrows():
    print(str(progress)," out of ",str(len(data)))
    print('Starting meta crawling.')
    row['meta'] = metacrawler(row["url"])
    print('Starting content crawling.')
    row['content'] = contentcrawler(row["url"])
    print('Complete.')
    progress += 1

However, when I aborted the process after few iterations, I found that no data was written into the data frame. No columns were created either.

What did I do wrong?

Solution

def func(row):
    print("Crawling Meta")
    meta = metacrawler(row["url"])
    print("Crawling Content")
    tags = contentcrawler(row["url"])
    return meta, content

data[['meta', 'content']] = data.apply(func, axis=1, result_type='expand')

Upvotes: 0

Views: 272

Answers (1)

pypypy
pypypy

Reputation: 1105

You can use the .apply() function Docs with result_type='expand'

In [3]: df = pd.DataFrame({'one':[1,2,3,4], 'two':[5,6,7,8]})

In [4]: df.apply(lambda x: (sum(x), sum(x)), axis=1, result_type='expand')
Out[4]:
    0   1
0   6   6
1   8   8
2  10  10
3  12  12

In [5]: df[['new', 'etc']] = df.apply(lambda x: (sum(x), sum(x)), axis=1, result_type='expand')

In [6]: df
Out[6]:
   one  two  new  etc
0    1    5    6    6
1    2    6    8    8
2    3    7   10   10
3    4    8   12   12

Edit: If you want to show progress, define the applied function separately i.e.

def func(row):
    print(row)
    return sum(row), sum(row)


In [3]: df = pd.DataFrame({'one':[1,2,3,4], 'two':[5,6,7,8]})

In [4]: df.apply(func), axis=1, result_type='expand')
Out[4]:
    0   1
0   6   6
1   8   8
2  10  10
3  12  12

In [5]: df[['new', 'etc']] = df.apply(lambda x: (sum(x), sum(x)), axis=1, result_type='expand')

In [6]: df
Out[6]:
   one  two  new  etc
0    1    5    6    6
1    2    6    8    8
2    3    7   10   10
3    4    8   12   12

Upvotes: 2

Related Questions