Reputation: 286

iterating over row and column and replace values based on condition

How do we divide all numeric values by 10 in the entire pandas dataframe lying between 10 and 100?

conditions:

Time or any non-numeric column to be ignored.
The numbers can lie in any row or column.

`time`	n1	n2	n3	n4
11:50	1	2	3	`40`
12:50	5	6	`70`	8
13:50	`80`	7	6	500

Use this code if need be:


import pandas as pd
import numpy as np

time = ['11:50', '12:50', '13:50']
data_1 = {'time': time,
          'n1': [1, 5, 80],
          'n2': [2, 6 ,7],
          'n3': [3, 70 ,6],
          'n4': [40, 8, 500],
        }

df1 = pd.DataFrame(data = data_1)
df1

Try 1: It doesn't seem to work

j = 0
k = 0
for i in df:
    if df[j][k] > 10 and df[j][k] < 100:
        df[j][k] = df[j][k] / 10
        j = j + 1
    else:
        pass;
    k = k + 1

Expected Result:

Since 80, 70, 40 are the numbers lying between 10 and 100, they are all replaced by x/10 in the same dataframe.

80 --> 80/10 = 8
70 --> 70/10 = 7
40 --> 40/10 = 4

Entire column of time is ignored as it is non-numeric value.

Upvotes: 2

Answers (4)

Rodalm

Reputation: 5503

Using DataFrame.applymap is pretty slow when working with a big data set, it doesn't scale well. You should always look for a vectorized solution if possible.

In this case, you can mask the values between 10 and 100 and perform the conditional replacement using DataFrame.mask (or DataFrame.where if you negate the condition).

# select the numeric columns
num_cols = df1.select_dtypes(include="number").columns

# In DataFrame.mask `df` is replaced by the calling DataFrame, 
# in this case df = df1[num_cols]
df1[num_cols] = (
    df1[num_cols].mask(lambda df: (df > 10) & (df < 100), 
                       lambda df: df // 10)
)

Output:

>>> df1

    time  n1  n2  n3   n4
0  11:50   1   2   3    4
1  12:50   5   6   7    8
2  13:50   8   7   6  500

Setup:

time = ['11:50', '12:50', '13:50']
data_1 = {'time': time,
          'n1': [1, 5, 80],
          'n2': [2, 6 ,7],
          'n3': [3, 70 ,6],
          'n4': [40, 8, 500],
        }

df1 = pd.DataFrame(data = data_1)

Upvotes: 2

Dejene T.

Reputation: 989

Try the following

def repl(df, cols):
    for col in cols:
        df[col] = df[col].apply(lambda x: x//10 if x >= 10 and x <= 100 else x)
    return df

new_df = repl(df1, ['n1', 'n2', 'n3', 'n4'])
new_df

Output:

   time   n1    n2  n3  n4
0   11:50   1   2   3   4
1   12:50   5   6   7   8
2   13:50   8   7   6   500

Upvotes: 1

BrokenBenchmark

Reputation: 19252

You can select the columns which have numeric datatypes, use .applymap() to perform the division operation, and then reassign back to the original dataframe. Notably, this doesn't require hardcoding the columns you want to transform in advance:

numerics = df1.select_dtypes(include="number")
numerics = numerics.applymap(lambda x: x // 10 if 10 < x < 100 else x)
df1[numerics.columns] = numerics

This outputs:

    time  n1  n2  n3   n4
0  11:50   1   2   3    4
1  12:50   5   6   7    8
2  13:50   8   7   6  500

Upvotes: 1

Karthik S

Reputation: 11546

Does this work:

df1[['n1','n2','n3','n4']].applymap(lambda x : x/10 if 10 < x < 100 else x)
    n1  n2  n3  n4
0   1.0 2   3.0 4.0
1   5.0 6   7.0 8.0
2   8.0 7   6.0 500.0

Upvotes: 1

iterating over row and column and replace values based on condition

Answers (4)

Related Questions