vish community
vish community

Reputation: 158

How to apply def function in full dataframe?

I need help to correct the function. I am confused for two things.

  1. how to put for loop into def function.
  2. Please, Correct my other function. It works only on single column
raw_data = {'age1': [23,45,21],'age2': [10,20,50]}
df = pd.DataFrame(raw_data, columns = ['age1','age2'])
df

It works well.

l = list(df.columns)
for c in l:
  df[c]=np.where(df[c]>45,df[c]+100,df[c])
  1. It does not work properly and add more value than 100. What is wrong here.
def fun(x):
  l = list(df.columns)
  for c in l:
    df[c]=np.where(df[c]>45,df[c]+100,df[c])
  return x
df.apply(fun)
  1. Why i can't apply this function on full dataframe. Please correct...
def f(x):
  val=[]
  if x>=40:
      val = x+100
  else:
      val = x
  return val
df.apply(f,axis=1)

Upvotes: 1

Views: 3294

Answers (2)

vish community
vish community

Reputation: 158

Filling and replacing na col values based on col type

df.transform(lambda x: x.fillna('') if x.dtype == 'float64' else x.float64(0))

df.transform(lambda x: x.replace('orange','juice') if x.dtype == 'object' else x.fillna(0))

Upvotes: 0

Henry Ecker
Henry Ecker

Reputation: 35676

The functions do different things.

The first option works because you're iterating over each column and applying np.where to each column once.

for c in df.columns:
    df[c] = np.where(df[c] > 45, df[c] + 100, df[c])

df:

   age1  age2
0    23    10
1    45    20
2    21   150

In this case:

def fun(x):
  l = list(df.columns)
  for c in l:
    df[c]=np.where(df[c]>45,df[c]+100,df[c])
  return x
df.apply(fun)

The function fun is called for every column (via apply), but you're doing the complete operation each time.

This is roughly equivalent to:

for _ in df.columns:
    for c in df.columns:
        df[c] = np.where(df[c] > 45, df[c] + 100, df[c])

Notice the nested looping.

Hence why it produces df:

   age1  age2
0    23    10
1    45    20
2    21   250

The last option is close:

def f(x):
  val=[]
  if x>=40:
      val = x+100
  else:
      val = x
  return val

df.apply(f,axis=1)

However x is a Series of values (DataFrame column) which means that x >= 40 does not work leading to an Error:

ValueError: The truth value of a Series is ambiguous. 
Use a.empty, a.bool(), a.item(), a.any() or a.all().

And can be modified just slightly to use applymap which applies the function to every cell in the DataFrame:

def f(x):
    if x > 45:  # Changed the bound to match the np.where condition
        val = x + 100
    else:
        val = x
    return val

df = df.applymap(f)

df:

   age1  age2
0    23    10
1    45    20
2    21   150

However, the more pandas approach here would be to use something like DataFrame.mask:

df = df.mask(df > 45, df + 100)

df:

   age1  age2
0    23    10
1    45    20
2    21   150

Upvotes: 3

Related Questions