Reputation: 465
I am new to pandas module. And have a quick question on data manipulation:
Suppose I have a table as follows:
Tool | WeekNumber | Status | Percentage
-----|------------|--------|------------
M1 | 1 | good | 85
M1 | 4 | bad | 75
M1 | 7 | good | 90
Based on the condition in Status, I would like to add percentage.
For example:
if the status is "good", then the following rows for subsequent week numbers should be all 100, i.e., next rows should be for weeks 2 and 3 with 100%
if the status is 'bad', the percentage should be 0 for the following week numbers, i.e., 0 for the weeks 5 and 6.
I have some idea on how to approach with the condition, but no idea to add rows:
import os, re
import pandas as pd
df = pd.read_excel("test.xlsx")
add_rows = []
for elem in df.Status:
if elem == "good":
add_rows.append(100)
if elem == "bad":
add_rows.append(0)
df.Percent = pd.Series(add_rows)
However, this only gives me three values based on the condition, and changes the values of specific week numbers. But I want the following:
Tool | WeekNumber | Status | Percentage
-----|------------|--------|------------
M1 | 1 | good | 85
M1 | 2 | good | 100
M1 | 3 | good | 100
M1 | 4 | bad | 75
M1 | 5 | bad | 0
M1 | 6 | bad | 0
M1 | 7 | good | 90
Upvotes: 2
Views: 2163
Reputation: 76947
You could do expand the dataframe first with set_index
and reindex
and fill the NaN
s in Tool
and Status
In [814]: dff = (df.set_index('WeekNumber')
.reindex(range(df.WeekNumber.min(), df.WeekNumber.max()+1))
.assign(Tool=lambda x: x.Tool.ffill(),
Status=lambda x: x.Status.ffill()))
In [815]: dff
Out[815]:
Tool Status Percentage
WeekNumber
1 M1 good 85.0
2 M1 good NaN
3 M1 good NaN
4 M1 bad 75.0
5 M1 bad NaN
6 M1 bad NaN
7 M1 good 90.0
Then, conditionally, fill Percentage
the values
In [816]: dff.loc[(dff.Status == 'good') & dff.Percentage.isnull(), 'Percentage'] = 100
In [817]: dff.loc[(dff.Status == 'bad') & dff.Percentage.isnull(), 'Percentage'] = 0
Finally use reset_index()
In [818]: dff.reset_index()
Out[818]:
WeekNumber Tool Status Percentage
0 1 M1 good 85.0
1 2 M1 good 100.0
2 3 M1 good 100.0
3 4 M1 bad 75.0
4 5 M1 bad 0.0
5 6 M1 bad 0.0
6 7 M1 good 90.0
Upvotes: 0
Reputation: 38415
Here's another one
val = pd.DataFrame({'WeekNumber':np.arange(df['WeekNumber'].min(), df['WeekNumber'].max()+ 1, 1)})
new_df = df.merge(val, on='WeekNumber', how = 'outer').sort_values(by = 'WeekNumber').reset_index(drop = True)
new_df[['Tool', 'Status']] = new_df[['Tool', 'Status']].ffill()
new_df['Percentage'] = np.where((new_df['Status'] == 'good') &
new_df['Percentage'].isnull(), 100, new_df['Percentage'])
new_df['Percentage'] = new_df['Percentage'].fillna(0)
You get
Tool WeekNumber Status Percentage
0 M1 1 good 85.0
1 M1 2 good 100.0
2 M1 3 good 100.0
3 M1 4 bad 75.0
4 M1 5 bad 0.0
5 M1 6 bad 0.0
6 M1 7 good 90.0
Upvotes: 2
Reputation: 323326
Try this ?
df=df.set_index('WeekNumber').reindex(range(1,8))
df.Tool.fillna('M1',inplace=True)
df.Status=df.Status.ffill()
df.Percentage.fillna(0,inplace=True)
df.Percentage=np.where((df.Status=='good')&(df.Percentage==0),100,df.Percentage)
df.reset_index()
Out[80]:
WeekNumber Tool Status Percentage
0 1 M1 good 85.0
1 2 M1 good 100.0
2 3 M1 good 100.0
3 4 M1 bad 75.0
4 5 M1 bad 0.0
Upvotes: 0
Reputation: 3049
Do your answer like this:
add_rows = []
for index, elem in enumerate(df.Status):
if elem == "good":
# assuming data is sorted by 'WeekNumber'
add_rows.append({'Tool': 'M1', 'WeekNumber': index + 2}) # etc
add_rows.append({'Tool': 'M1', 'WeekNumber': index + 3}) # etc
more_data = pd.DataFrame(add_rows)
df = pd.concat([df, more_data]).sort_values(by='WeekNumber')
Upvotes: 1
Reputation: 3049
You can use .iterrows()
to iterate through each row.
for index, row in df.iterrows():
print row.Status
>>> good
>>> bad
>>> good
If I needed it to work using some rough code, I'd use my code:
new_index = 0
new_dict = {}
for index, row in df.iterrows():
use_index = index + new_index
new_row[use_index] = {}
new_row[use_index]= {
'Tool': row.Tool,
'WeekNumber': row.WeekNumber,
'Status': row.Status,
'Percentage': row.Percentage,
}
if row.Percentage == 100:
for n in range(2):
add_index = index + 1 + new_index
new_dict[add_index] = {}
new_row[add_index]= {
'Tool': 'M1',
'WeekNumber': row.WeekNumber + n,
'Status': 'good',
'Percentage': 100,
}
new_index += 1
df = pd.DataFrame(new_dict)
Upvotes: 1