Data Enthusiast
Data Enthusiast

Reputation: 541

Use of loc to update a dataframe python pandas

I have a pandas dataframe (df) with the column structure :

month a b c d

this dataframe has data for say Jan, Feb, Mar, Apr. A,B,C,D are numeric columns. For the month of Feb , I want to recalculate column A and update it in the dataframe i.e. for month = Feb, A = B + C + D

Code I used :

 df[df['month']=='Feb']['A']=df[df['month']=='Feb']['B'] + df[df['month']=='Feb']['C'] + df[df['month']=='Feb']['D'] 

This ran without errors but did not change the values in column A for the month Feb. In the console, it gave a message that :

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

I tried to use .loc but right now the dataframe I am working on, I had used .reset_index() on it and I am not sure how to set index and use .loc. I followed documentation but not clear. Could you please help me out here? This is an example dataframe :

 import pandas as pd import numpy as np
 dates = pd.date_range('1/1/2000', periods=8)
 df = pd.DataFrame(np.random.randn(8, 4), index=dates, columns=['A', 'B', 'C', 'D']) 

I want to update say one date : 2000-01-03. I am unable to give the snippet of my data as it is real time data.

Upvotes: 19

Views: 50744

Answers (3)

Sairam Krish
Sairam Krish

Reputation: 11701

With dataframe loc, we can perform data updation. The second parameter in loc can take in an array. It's not well documented. Using this, we can perform multi column update.

Making use of this, we can use a dictionary of attributes that needs to be updated.

def update_df(
    df: pd.DataFrame, query: str, update_attributes: Dict[str, Any]
) -> pd.DataFrame:
    """
    Update the dataframe based on the query and update_attributes.

    Args:
        df (pd.DataFrame): Input dataframe
        query (str): Query to filter the records
        update_attributes (Dict[str, Any]): Attributes to update

    Returns:
        pd.DataFrame: returns the updated dataframe
    """
    df.loc[df.query(query).index, list(update_attributes.keys())] = list(
        update_attributes.values()
    )

    return df

Usage

    df = update_df(df, "column1 == 'value1'", {
             "column2": "new_value", 
             "column3": "new_value"
    })

Upvotes: 0

Anton Protopopov
Anton Protopopov

Reputation: 31672

As you could see from the warning you should use loc[row_index, col_index]. When you subsetting your data you get index values. You just need to pass for row_index and then with comma col_name:

df.loc[df['month'] == 'Feb', 'A'] = df.loc[df['month'] == 'Feb', 'B'] + df.loc[df['month'] == 'Feb', 'C'] + df.loc[df['month'] == 'Feb', 'D'] 

Upvotes: 27

DeepSpace
DeepSpace

Reputation: 81604

While not being the most beautiful, the way I would achieve your goal (without explicitly iterating over the rows) is:

df.ix[df['month'] == 'Feb', 'a'] = df[df['month'] == 'Feb']['b'] + df[df['month'] == 'Feb']['c']  

Note: ix has been deprecated since Pandas v0.20.0 in favour of iloc / loc.

Upvotes: 1

Related Questions