user9927059
user9927059

Reputation: 63

Pandas | Copy Dataframe Values Based on Condition

I have a DataFrame, X, in the following format:

Value1   Value2   Value3   Month   Area

   1       1        3        0      1
   4       10       45       1      1
   ..      ..       ..       ..     .. 
   15      11       10       34     1
   10      21       12       35     1
   1       2        2        0      2
   3       4        4        1      2
   ..      ..       ..       ..     .. 
   27      39       21       34     2
   23      42       32       35     2
   ..      ..       ..       ..     .. 

From the DataFrame X, I want to create a DataFrame Y, with the same columns except for Month and Area. The rows of Y, need to be relative to the Area column but have the column values for the row of X+1.

For example, the first row of Y will contain the value of the second row of X, if the area is 1.

Value1   Value2   Value3   (Index) (Area)   

   4       10       45        0     1
   ..      ..       ..       ..     .. 
   10      21       12       34     1
   3       4        4        35     2
   ..      ..       ..       ..     .. 
   23      42       32       69     2
   ..      ..       ..       ..     ..

Then, I need to remove the rows in X, which contains the last Month Value, for every Area.

The DataFrame X would then be converted into something like the following:

Value1   Value2   Value3   Month   Area

   1       1        3        0      1
   4       10       45       1      1
   ..      ..       ..       ..     .. 
   15      11       10       34     1
   1       2        2        0      2
   3       4        4        1      2
   ..      ..       ..       ..     .. 
   27      39       21       34     2
   ..      ..       ..       ..     .. 

This is my code at the moment:

#Define a list of indexes of X to Drop
to_drop = list()
prev = None
y_index = 0
    
    
    #Iterate through X
    for index, row in x.iterrows():
        if not prev is None:
            
            #If the Area is the same as before
            if row['Area'] == prev :
                #Append to the Y dataframe and increase y's index
                y.loc[y_index] = row[y_columns]
                y_index+=1
            #We found a new index. We want to delete the previous one, which will have the last 
            #month value for the previous area
            else: to_drop.append(prev_index)
            
        prev_index= index
        prev = row['Area']
    
    print("Removing some rows...")
    x = x.drop(x.index[to_drop])

However, when I check the new DataFrames, they seem to be losing a lot of information, such as full Areas being wiped out. Any idea for a better approach, and what I could be doing wrong?

Please note, this is an explicative example. The Dataset is made of several thousand Area values, with around 30 Month Values.

Thanks in advance.

Upvotes: 2

Views: 410

Answers (1)

Scott Boston
Scott Boston

Reputation: 153460

Try this:

df['Month'] = df['Month'].shift()
df = df.dropna(subset=['Month'], axis=0)
df[df.duplicated('Area', keep='last')]

Older solution

Does drop_duplicates work for you?

df.drop_duplicates(['Area'], keep='first')

Output:

   Value1  Value2  Value3  Month  Area
0       1       1       3      0     1
2       1       2       2      0     2

And,

df.drop_duplicates(['Area'], keep='last')

Output:

   Value1  Value2  Value3  Month  Area
1       4      10      45      1     1
3       3       4       4      1     2

Update, drops the last records in each group:

df[df.duplicated('Area', keep='last')]

Upvotes: 0

Related Questions