I am new to pandas module. And have a quick question on data manipulation: Suppose I have a table as follows: Tool | WeekNumber | Status | Percentage -----|------------|--------|------------ M1 | 1 | good | 85 M1 | 4 | bad | 75 M1 | 7 | good | 90 Based on the condition in Status, I would like to add percentage. For example: if the status is "good", then the following rows for subsequent week numbers should be all 100, i.e., next rows should be for weeks 2 and 3 with 100% if the status is 'bad', the percentage should be 0 for the following week numbers, i.e., 0 for the weeks 5 and 6. I have some idea on how to approach with the condition, but no idea to add rows: import os, re import pandas as pd df = pd.read_excel("test.xlsx") add_rows = [] for elem in df.Status: if elem == "good": add_rows.append(100) if elem == "bad": add_rows.append(0) df.Percent = pd.Series(add_rows) However, this only gives me three values based on the condition, and changes the values of specific week numbers. But I want the following: Tool | WeekNumber | Status | Percentage -----|------------|--------|------------ M1 | 1 | good | 85 M1 | 2 | good | 100 M1 | 3 | good | 100 M1 | 4 | bad | 75 M1 | 5 | bad | 0 M1 | 6 | bad | 0 M1 | 7 | good | 90

Reputation: 465

Adding rows based on previous condition in another column

I am new to pandas module. And have a quick question on data manipulation:

Suppose I have a table as follows:

Tool | WeekNumber | Status | Percentage
-----|------------|--------|------------
  M1 |     1      |   good |     85
  M1 |     4      |   bad  |     75
  M1 |     7      |   good |     90

Based on the condition in Status, I would like to add percentage.

For example:

if the status is "good", then the following rows for subsequent week numbers should be all 100, i.e., next rows should be for weeks 2 and 3 with 100%
if the status is 'bad', the percentage should be 0 for the following week numbers, i.e., 0 for the weeks 5 and 6.

I have some idea on how to approach with the condition, but no idea to add rows:

import os, re
import pandas as pd
df = pd.read_excel("test.xlsx")

add_rows = []
for elem in df.Status:
    if elem == "good":
        add_rows.append(100)
    if elem == "bad":
        add_rows.append(0)

df.Percent = pd.Series(add_rows)

However, this only gives me three values based on the condition, and changes the values of specific week numbers. But I want the following:

Tool | WeekNumber | Status | Percentage
-----|------------|--------|------------
  M1 |     1      |   good |     85
  M1 |     2      |   good |     100
  M1 |     3      |   good |     100
  M1 |     4      |   bad  |     75
  M1 |     5      |   bad  |      0
  M1 |     6      |   bad  |      0
  M1 |     7      |   good |     90

Upvotes: 2

Answers (5)

Zero

Reputation: 76947

You could do expand the dataframe first with set_index and reindex and fill the NaNs in Tool and Status

In [814]: dff = (df.set_index('WeekNumber')
                   .reindex(range(df.WeekNumber.min(), df.WeekNumber.max()+1))
                   .assign(Tool=lambda x: x.Tool.ffill(),
                           Status=lambda x: x.Status.ffill()))

In [815]: dff
Out[815]:
           Tool Status  Percentage
WeekNumber
1            M1   good        85.0
2            M1   good         NaN
3            M1   good         NaN
4            M1    bad        75.0
5            M1    bad         NaN
6            M1    bad         NaN
7            M1   good        90.0

Then, conditionally, fill Percentage the values

In [816]: dff.loc[(dff.Status == 'good') & dff.Percentage.isnull(), 'Percentage'] = 100

In [817]: dff.loc[(dff.Status == 'bad') & dff.Percentage.isnull(), 'Percentage'] = 0

Finally use reset_index()

In [818]: dff.reset_index()
Out[818]:
   WeekNumber Tool Status  Percentage
0           1   M1   good        85.0
1           2   M1   good       100.0
2           3   M1   good       100.0
3           4   M1    bad        75.0
4           5   M1    bad         0.0
5           6   M1    bad         0.0
6           7   M1   good        90.0

Upvotes: 0

Vaishali

Reputation: 38415

Here's another one

val = pd.DataFrame({'WeekNumber':np.arange(df['WeekNumber'].min(), df['WeekNumber'].max()+ 1, 1)})
new_df = df.merge(val, on='WeekNumber', how = 'outer').sort_values(by = 'WeekNumber').reset_index(drop = True)
new_df[['Tool', 'Status']] = new_df[['Tool', 'Status']].ffill()
new_df['Percentage'] = np.where((new_df['Status'] == 'good') & 
new_df['Percentage'].isnull(), 100, new_df['Percentage'])
new_df['Percentage'] = new_df['Percentage'].fillna(0)

You get

    Tool    WeekNumber  Status  Percentage
0   M1      1           good    85.0
1   M1      2           good    100.0
2   M1      3           good    100.0
3   M1      4           bad     75.0
4   M1      5           bad     0.0
5   M1      6           bad     0.0
6   M1      7           good    90.0

Upvotes: 2

BENY

Reputation: 323326

Try this ?

df=df.set_index('WeekNumber').reindex(range(1,8))
df.Tool.fillna('M1',inplace=True)
df.Status=df.Status.ffill()
df.Percentage.fillna(0,inplace=True)
df.Percentage=np.where((df.Status=='good')&(df.Percentage==0),100,df.Percentage)
df.reset_index()


Out[80]: 
   WeekNumber Tool Status  Percentage
0           1   M1   good        85.0
1           2   M1   good       100.0
2           3   M1   good       100.0
3           4   M1    bad        75.0
4           5   M1    bad         0.0

Upvotes: 0

s3bw

Reputation: 3049

Do your answer like this:

add_rows = []
for index, elem in enumerate(df.Status):
    if elem == "good":

        # assuming data is sorted by 'WeekNumber'
        add_rows.append({'Tool': 'M1', 'WeekNumber': index + 2}) # etc
        add_rows.append({'Tool': 'M1', 'WeekNumber': index + 3}) # etc

more_data = pd.DataFrame(add_rows)
df = pd.concat([df, more_data]).sort_values(by='WeekNumber')

Upvotes: 1

s3bw

Reputation: 3049

You can use .iterrows() to iterate through each row.

for index, row in df.iterrows():
    print row.Status

>>> good
>>> bad
>>> good

If I needed it to work using some rough code, I'd use my code:

new_index = 0
new_dict = {}
for index, row in df.iterrows():
    use_index = index + new_index

    new_row[use_index] = {}

    new_row[use_index]= {
        'Tool': row.Tool,
        'WeekNumber': row.WeekNumber,
        'Status': row.Status,
        'Percentage': row.Percentage,
    }

    if row.Percentage == 100:
        for n in range(2):
            add_index = index + 1 + new_index

            new_dict[add_index] = {}

            new_row[add_index]= {
                'Tool': 'M1',
                'WeekNumber': row.WeekNumber + n,
                'Status': 'good',
                'Percentage': 100,
            }

            new_index += 1

df = pd.DataFrame(new_dict)

Upvotes: 1

Adding rows based on previous condition in another column

Answers (5)

Related Questions