AlexSB
AlexSB

Reputation: 607

.apply() function to dataframe and return new dataframe?

What is the best option to create new DataFrame from a function applied to each row of a data frame. The ultimate goal is to concat (rbind) all the resulting new_dataframes.

Input:

   Name  Age
0   tom   10
1  nick   15
2  juli   14

Example:

import pandas as pd
import pdb

data = [['tom', 10], ['nick', 15], ['juli', 14]]
df = pd.DataFrame(data, columns=['Name', 'Age'])

def foo(row):
 #pdb.set_trace()
 new_df = row.to_frame(name='Values')
 new_df.loc[new_df.index=='Name','New_column'] = 'Surname'
 new_df.loc[new_df.index=='Age','New_column'] = '+5 months'
 return new_df

df.apply(foo, axis=1)

Output:

data = {'Values':['Tom', '10', 'nich', '15', 'juli', '14'], 
'New_column': ['Surname', '+5 months', 'Surname', '+5 months', 'Surname', 
'+5 months']}
output = pd.DataFrame(data)

 Values New_column
0    Tom    Surname
1     10  +5 months
2   nich    Surname
3     15  +5 months
4   juli    Surname
5     14  +5 months

If .apply() is not the best option, I would appreciate an alternative.

For R users, I am looking for do.call(rbind, sapply())

Thanks.

Upvotes: 2

Views: 1240

Answers (4)

Erfan
Erfan

Reputation: 42916

Without using apply which is pretty slow, we can use pandas and numpy methods here: transform, melt and numpy.tile:

df = df.T.melt().drop(columns='variable')
df['New_column'] = np.tile(['Surname', '5+ months'], len(df)//2)

  value New_column
0   tom    Surname
1    10  5+ months
2  nick    Surname
3    15  5+ months
4  juli    Surname
5    14  5+ months

Upvotes: 1

Henry James
Henry James

Reputation: 145

Perhaps try:

df = df.apply(foo, axis=1)

Upvotes: 0

Andrea Grioni
Andrea Grioni

Reputation: 187

Here a different approach that is using built-in functions of pandas and numpy.

import pandas as pd
import numpy as np
import pdb

# create df
data = [['tom', 10], ['nick', 15], ['juli', 14]]
df = pd.DataFrame(data, columns=['Name', 'Age'])

# provide unique ids for each row
df['id']=df.index
# Unpivot DataFrame using unique id as reference
n = df.melt(id_vars=['id'], value_vars=['Name', 'Age'])
# add 'new_column' and updates its values with np.where
n['new_column'] = np.where(n['variable'] == 'Name', 'Surname', '+5 months')
# sort df to pair name and age
n.sort_values('id', inplace=True)
# assign row names
n.index = n['variable']
# drop unnecessary columns
n.drop(['id', 'variable'], axis = 1)

output:

           value    new_column
variable        
Name       tom      Surname
Age        10       +5 months
Name       nick     Surname
Age        15       +5 months
Name       juli     Surname
Age        14       +5 months

Upvotes: 0

Valdi_Bo
Valdi_Bo

Reputation: 30971

Start from one improvement in your function:

def foo(row):
    new_df = row.to_frame(name='Values')
    new_df.loc['Name', 'New_column'] = 'Surname'
    new_df.loc['Age', 'New_column'] = '+5 months'
    return new_df

("new_df.index==" is not needed).

To get your output, convert the Series of DataFrames (resulting from apply) into an ordinaty list (of DataFrames) and concatenate them.

The code to do it is:

pd.concat(df.apply(foo, axis=1).tolist())

Upvotes: 2

Related Questions