johnJones901
johnJones901

Reputation: 47

Function that adds multiple columns to dataframe based on calculations - Pandas

I have a dataframe like so:

name:   ...  line: 
bobo    ...   10
amy     ...   5
amanda  ...   15

I want to create a function that can be used for multiple dataframes, which adds new columns to the dataframe based on the calculations within the function. This is what I am trying to do with the function, but it isn't working.

def check(df, lines):
    
    for line in lines:
        df['big_line'] = (line*5, line)
        df['small_line'] = line*2
        df['massive_line'] = line*10
        df['line_word'] = line + ' line'
        
    return df

Essentially, what I am trying to get it to return is the dataframe looking like this:

Function call:

def check(df, df['line'])

Return:

name:   ...  line: big_line: small_line: massive_line: line_word:
bobo    ...   10   (50, 10)         20           100         10 line
amy     ...   5     (25, 5)         10            50          5 line
amanda  ...   15  ...............................................

If someone could point me in the right direction that would be great. Thanks.

I am getting an error with big_line because it is tuple sort of object.

Upvotes: 1

Views: 99

Answers (3)

Carmoreno
Carmoreno

Reputation: 1319

You are assigning a sequence to a Series object. Your sequence has only 2 elements, but the dataframe has more than 2 rows. This answer can help you to understand the error:

def check(df, lines):
    for line in lines.to_list():
        df['big_line'] = f"({line*5}, {line})"
        df['small_line'] = line*2
        df['massive_line'] = line*10
        df['line_word'] = line + ' line'
    return df

check(df, df['line'])

Output:

    name    line    big_line    small_line  massive_line    line_word
0   bobo      10    (75, 15)         30     150             15 line
1   amy       5     (75, 15)         30     150             15 line
2   amanda    15    (75, 15)         30     150             15 line

EDIT: According your comment, if you want to update each row of your original dataframe, then I propose to modify your original function in order to index each row number, using loc method:

def check(df, lines):
  for index, line in enumerate(lines.to_list()):
      df.loc[index, 'big_line'] = f"({line*5}, {line})"
      df.loc[index, 'small_line'] = line*2
      df.loc[index, 'massive_line'] = line*10
      df.loc[index, 'line_word'] = line + ' line'
  return df

Output:

    name    line    big_line    small_line  massive_line    line_word
0   bobo    10     (50, 10)            20   100             10 line
1   amy     5      (25, 5)             10   50               5 line
2   amanda  15     (75, 15)            30   150             15 line

Upvotes: 1

user12014098
user12014098

Reputation: 21

If you just want a string you could try:

df['big_line'] = f'({5*line}, {line})'

If it needs to be a tuple then include this after creating the string:

df['big_line'] = df.big_line.apply(lambda x: eval(x))

Upvotes: 1

mozway
mozway

Reputation: 260500

Using a function that computes the output per row

Input:

df = pd.DataFrame({'line': [10,5,15]}, index=['bobo', 'amy', 'amanda']).rename_axis(index='name')

        line
name        
bobo      10
amy        5
amanda    15

You can define a function that returns a Series:

def check(s):
    line = s['line']
    return pd.Series({'big_line': (line*5, line),
                      'small_line': line*2,
                      'massive_word': line*10,
                      'line_word': str(line)+' line'
                     })

Then apply it to the rows:

df.apply(check, axis=1)

Output:

        big_line  small_line  massive_word line_word
name                                                
bobo    (50, 10)          20           100   10 line
amy      (25, 5)          10            50    5 line
amanda  (75, 15)          30           150   15 line

Using vector operations

df['big_line']     = df['line'].apply(lambda x: (5*x, x))
df['small_line']   = df['line']*2
df['massive_line'] = df['line']*10
df['line_word']    = df['line'].astype(str)+' line'

Upvotes: 2

Related Questions