Reputation: 51
I am trying to insert new rows into an excel using pandas data-frame when a particular columns has a specific condition
For ex:
Input
A B C D E
0 AA 111 2 2
1 CC 222 8 12
2 DD 333 3 3
Output
A B C D E (Output Column)
0 AA 111 2 2 111-2
1 CC 222 8 8 222-8
2 CC 222 9 9 222-9
3 CC 222 10 10 222-10
4 CC 222 11 11 222-11
5 CC 222 12 12 222-12
6 DD 333 3 3 333-3
If you see here the Column C and D has a range of 8-12 for Row # 1. So I need to split the row accordingly. If C and D are same, no appending of new rows.
Upvotes: 3
Views: 1250
Reputation: 549
df = pd.DataFrame(
data={
'A': ['AA', 'CC', 'DD'],
'B': [111, 222, 333],
'C':[2, 8, 3],
'D':[2, 12, 3],
'E':[None, None, None],
}
)
new_df = pd.DataFrame(
data={
'A': [],
'B': [],
'C': [],
'D': [],
'E': [],
},
dtype=np.int64
)
for idx, row in df.iterrows():
if row['C'] == row['D']:
new_df = new_df.append(
pd.DataFrame(
data={
'A': [row['A']],
'B': [int(row['B'])],
'C': [int(row['C'])],
'D': [int(row['D'])],
'E': [str(row['B']) + '-' + str(row['D'])],
}
)
)
elif int(row['D']) > int(row['C']):
tmp_c = int(row['C'])
tmp_d = int(row['D'])
while tmp_d >= tmp_c:
new_df = new_df.append(
pd.DataFrame(
data={
'A': [row['A']],
'B': [int(row['B'])],
'C': [int(row['C'])],
'D': [tmp_c],
'E': [str(row['B']) + '-' + str(tmp_c)],
}
)
)
tmp_c += 1
print(new_df)
Upvotes: 1
Reputation: 18647
Another solution, using Index.repeat
to create the output frame, then groupby.cumcount
and str
concatenation to update the values of columns C
, D
and E
:
df1 = df.loc[df.index.repeat((df.D - df.C).add(1))]
df1['C'] = df1['C'] + df1.groupby('A').cumcount()
df1['D'] = df1['C']
df1['E'] = df['B'].astype(str) + '-' + df1['C'].astype(str)
[out]
A B C D E
0 AA 111 2 2 111-2
1 CC 222 8 8 222-8
1 CC 222 9 9 222-9
1 CC 222 10 10 222-10
1 CC 222 11 11 222-11
1 CC 222 12 12 222-12
2 DD 333 3 3 333-3
Upvotes: 6
Reputation: 5740
My example uses to get data from lines with different values for C and D columns and create new data for them. Next add this new data to data with no differences.
import pandas as pd
# setup data
data_raw = [['AA', 111, 2, 2], ['CC', 222, 8, 12], ['DD', 333, 3, 3]]
data = pd.DataFrame(data_raw, columns=['A', 'B', 'C','D'])
# get items with no difference
rest_of_data = data.loc[data['C'] == data['D']]
# create value for E column
rest_of_data = rest_of_data.copy()
rest_of_data['E'] = str(str(rest_of_data['B'].values[0]) + '-' + str(rest_of_data['C'].values[0]))
# find items with difference
difference_data = data.loc[data['C'] != data['D']]
# get numbers of elements to create
start = int(difference_data['C'])
stop = int(difference_data['D'])
# create new data
create_data = []
for i in range(start,stop+1,1):
new = [difference_data['A'].values[0], difference_data['B'].values[0], i, i, str(difference_data['B'].values[0])+'-'+str(i)]
create_data.append(new)
new_data = pd.DataFrame(create_data, columns=['A', 'B', 'C','D', 'E'])
# concatenate frames
frames = [rest_of_data, new_data]
result = pd.concat(frames, ignore_index=True)
Result:
A B C D E
0 AA 111 2 2 111-2
1 DD 333 3 3 111-2
2 CC 222 8 8 222-8
3 CC 222 9 9 222-9
4 CC 222 10 10 222-10
5 CC 222 11 11 222-11
6 CC 222 12 12 222-12
Upvotes: 2