Reputation: 121
I have a csv file as follows:
name,row,column,length_of_field
AB000M,8,12,1
AB000M,9,12,1
AB000M,10,0,80
AB000M,10,12,1
AB000M,11,1,1
AB000M,21,0,80
AB000M,22,0,80
What I am trying to do is whenever the column field is 0, add the field length to column and row should be decremented by 1. Also if there is a subsequent line where column is again 0, then add the first field length to the column, decrease row by 1 for the first occurrence and add the field lengths together. And then delete the older lines.
So in the above csv - "AB000M,10,0,80" would become "AB000M,9,80,80" and
"AB000M,21,0,80
AB000M,22,0,80" --- these two lines should be replaced by "AB000M,20,80,160".
I am trying to achieve this with this snippet but its not working:
df = pd.read_csv("file.csv")
for ind in df.index:
if ind >= len(df)-1:
break
if df['column'][ind] == 0 and df['column'][ind + 1] != 0:
df['row'][ind] -= 1
df['column'][ind] = 80
elif df['column'][ind] == 0 and df['column'][ind + 1] == 0:
df['row'][ind] -= 1
df['column'][ind] = 80
df['length_of_field'][ind] += df['length_of_field'][ind + 1]
df.drop([df.index[ind + 1]], axis=0)
Upvotes: 1
Views: 979
Reputation: 5601
sample data:
df_str = '''
name,row,column,length_of_field
AB000M,8,12,1
AB000M,9,12,1
AB000M,10,0,80
AB000M,10,12,1
AB000M,11,1,1
AB000M,21,0,80
AB000M,22,0,80
AB000M,23,11,1
AB000M,24,11,1
AB000M,25,0,80
AB000M,26,0,80
AB000M,27,0,80
AB000M,28,11,1
AB000M,29,0,80
'''
df = pd.read_csv(io.StringIO(df_str.strip()), sep=',', index_col=False)
solution:
# split the row which to update or left
cond = df['column'] == 0
df_to_update = df[cond].copy()
df_left = df[~cond].copy()
# modify the update rows
df_to_update['column'] = df_to_update['length_of_field']
df_to_update['row'] -= 1
# create tag for which is diff 1 with the previous row
cond = df_to_update['row'].diff() != 1
df_to_update['tag'] = np.where(cond, 1, 0)
# cumsum tag to creat group
df_to_update['label'] = df_to_update['tag'].cumsum()
print(df_to_update)
# name row column length_of_field tag label
# 2 AB000M 9 80 80 1 1
# 5 AB000M 20 80 80 1 2
# 6 AB000M 21 80 80 0 2
# 9 AB000M 24 80 80 1 3
# 10 AB000M 25 80 80 0 3
# 11 AB000M 26 80 80 0 3
# 13 AB000M 28 80 80 1 4
# agg groupy left first row, and sum(length_of_field)
obj_list = []
for tag, group in df_to_update.groupby('label'):
obj = group.iloc[0].copy()
obj['length_of_field'] = group['length_of_field'].sum()
obj_list.append(obj)
dfn_to_update = pd.concat(obj_list,axis=1).T[df.columns]
# merge final result
dfn = df_left.append(dfn_to_update).sort_index()
result:
print(dfn)
name row column length_of_field
0 AB000M 8 12 1
1 AB000M 9 12 1
2 AB000M 9 80 80
3 AB000M 10 12 1
4 AB000M 11 1 1
5 AB000M 20 80 160
7 AB000M 23 11 1
8 AB000M 24 11 1
9 AB000M 24 80 240
12 AB000M 28 11 1
13 AB000M 28 80 80
print(df)
name row column length_of_field
0 AB000M 8 12 1
1 AB000M 9 12 1
2 AB000M 10 0 80
3 AB000M 10 12 1
4 AB000M 11 1 1
5 AB000M 21 0 80
6 AB000M 22 0 80
7 AB000M 23 11 1
8 AB000M 24 11 1
9 AB000M 25 0 80
10 AB000M 26 0 80
11 AB000M 27 0 80
12 AB000M 28 11 1
13 AB000M 29 0 80
Upvotes: 1
Reputation: 445
This is an example of something that may work for you.
import pandas as pd
df = pd.read_csv('test.csv')
newRows = []
last_val_is_zero = False
tempRow = None
for row in df.iterrows():
vals = row[1]
if vals['column'] == 0:
if not last_val_is_zero:
vals['row'] = vals['row'] - 1
vals['column'] = vals['length_of_field']
tempRow = vals
last_val_is_zero = True
else:
tempRow['length_of_field'] = tempRow['length_of_field'] + vals['length_of_field']
else:
if tempRow is not None:
newRows.append(tempRow)
newRows.append(vals)
tempRow = None
last_val_is_zero = False
if tempRow is not None:
newRows.append(tempRow)
newData = [[val for val in row] for row in newRows]
newDf = pd.DataFrame(newData, columns=[x for x in newRows[0].keys()])
Upvotes: 1