Reputation: 47
I have a dataframe like so:
name: ... line:
bobo ... 10
amy ... 5
amanda ... 15
I want to create a function that can be used for multiple dataframes, which adds new columns to the dataframe based on the calculations within the function. This is what I am trying to do with the function, but it isn't working.
def check(df, lines):
for line in lines:
df['big_line'] = (line*5, line)
df['small_line'] = line*2
df['massive_line'] = line*10
df['line_word'] = line + ' line'
return df
Essentially, what I am trying to get it to return is the dataframe looking like this:
Function call:
def check(df, df['line'])
Return:
name: ... line: big_line: small_line: massive_line: line_word:
bobo ... 10 (50, 10) 20 100 10 line
amy ... 5 (25, 5) 10 50 5 line
amanda ... 15 ...............................................
If someone could point me in the right direction that would be great. Thanks.
I am getting an error with big_line because it is tuple sort of object.
Upvotes: 1
Views: 99
Reputation: 1319
You are assigning a sequence to a Series object. Your sequence has only 2 elements, but the dataframe has more than 2 rows. This answer can help you to understand the error:
def check(df, lines):
for line in lines.to_list():
df['big_line'] = f"({line*5}, {line})"
df['small_line'] = line*2
df['massive_line'] = line*10
df['line_word'] = line + ' line'
return df
check(df, df['line'])
Output:
name line big_line small_line massive_line line_word
0 bobo 10 (75, 15) 30 150 15 line
1 amy 5 (75, 15) 30 150 15 line
2 amanda 15 (75, 15) 30 150 15 line
EDIT: According your comment, if you want to update each row of your original dataframe, then I propose to modify your original function in order to index each row number, using loc
method:
def check(df, lines):
for index, line in enumerate(lines.to_list()):
df.loc[index, 'big_line'] = f"({line*5}, {line})"
df.loc[index, 'small_line'] = line*2
df.loc[index, 'massive_line'] = line*10
df.loc[index, 'line_word'] = line + ' line'
return df
Output:
name line big_line small_line massive_line line_word
0 bobo 10 (50, 10) 20 100 10 line
1 amy 5 (25, 5) 10 50 5 line
2 amanda 15 (75, 15) 30 150 15 line
Upvotes: 1
Reputation: 21
If you just want a string you could try:
df['big_line'] = f'({5*line}, {line})'
If it needs to be a tuple then include this after creating the string:
df['big_line'] = df.big_line.apply(lambda x: eval(x))
Upvotes: 1
Reputation: 260500
Input:
df = pd.DataFrame({'line': [10,5,15]}, index=['bobo', 'amy', 'amanda']).rename_axis(index='name')
line
name
bobo 10
amy 5
amanda 15
You can define a function that returns a Series:
def check(s):
line = s['line']
return pd.Series({'big_line': (line*5, line),
'small_line': line*2,
'massive_word': line*10,
'line_word': str(line)+' line'
})
Then apply it to the rows:
df.apply(check, axis=1)
Output:
big_line small_line massive_word line_word
name
bobo (50, 10) 20 100 10 line
amy (25, 5) 10 50 5 line
amanda (75, 15) 30 150 15 line
df['big_line'] = df['line'].apply(lambda x: (5*x, x))
df['small_line'] = df['line']*2
df['massive_line'] = df['line']*10
df['line_word'] = df['line'].astype(str)+' line'
Upvotes: 2