Reputation: 1679

Efficient way to conditionally modify columns in a pandas dataframe, row by row

I have a dataframe that looks like this:

length      code1    code2    code3
4            0         1        1
8            1         1        0
7            1         0        0

I want to write a function that checks the value in length. If the value is >= 7, I want to add 1 to the value present in code2 and code3. What is the best way to do this? So far, I have:

def char_count_pred(df):
    
    
    if df.length >= 7:
           df.code2 += 1
           df.code3 += 1

    return df


master_df = char_count_pred(master_df)

I understand I need to build a loop to iterate over each row, but I am confused on the most efficient way to loop through rows of and performing tasks on multiple columns.

edit

When trying the solutions below, I get the same errors:

When I try the script as is....


---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
C:\Anaconda3\lib\site-packages\pandas\core\indexes\base.py in get_loc(self, key, method, tolerance)
   2889             try:
-> 2890                 return self._engine.get_loc(key)
   2891             except KeyError:

pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas\_libs\index_class_helper.pxi in pandas._libs.index.Int64Engine._check_type()

KeyError: True

When I set the script to = a variable...


  File "<ipython-input-140-9f2f40a5bb96>", line 3
    df = df.loc[df.length>=7]+=1
                                                                                   ^
SyntaxError: invalid syntax

Upvotes: 0

Answers (4)

coco18

Reputation: 1085

I hope this will help:

EDIT
This is everything, which i am using: Definition of your datafrme:

df = pd.DataFrame(columns=["length","code1","code2","code3"],
                  data=[[4,0,1,1],
                        [8,1,1,0],
                        [7,1,0,0]])

Definition of the function:

def char_count_pred(df):
    for col in df.columns:
        df[col].loc[df[col]>7]+=1
char_count_pred(df)

Everything works, I don't know, where the problem is.

Upvotes: 1

Thomas Kimber

Reputation: 11067

You could perform an apply which would likely be close to the most efficient way of modifying your columns based on some function.

There's an answer here you could take a look at, or - try something like this as a template for your specific use-case:

master_df["code2"] = master_df.apply(lambda x : x["code2"] + 1 if x["length"] >= 7 else x["code2"], axis=1)

Which will update your "code2" field by applying a function (in this case an anonymous lamba function, but could equally be some named function as per your def) the only limitation being that it's simpler if those functions target a single column at a time.

There are methods for updating/generating results to update multiple columns at once, but it might be simpler to start of updating single columns at a time.

Upvotes: 0

Ben Pap

Reputation: 2579

df.loc[df['length']>=7, 'code2':] += 1

Use .loc to search for rows greater than or equal to 7, then select the correct columns and add 1

Upvotes: 2

Chris

Reputation: 16147

df.loc[df.length >=7, ['code1','code2']]+=1

Upvotes: 3

Efficient way to conditionally modify columns in a pandas dataframe, row by row

edit

Answers (4)

Related Questions