bellatrix
bellatrix

Reputation: 139

How do I find start and end indices in python list for all the rows

My code -

df=pd.read_csv("file")
l1=[]
l2=[]
for i in range(0,len(df['unions']),len(df['district'])):
    l1.append(' '.join((df['unions'][i], df['district'][i])))
    l2.append(({"entities": [[(ele.start(), ele.end() - 1) for ele in re.finditer(r'\S+', df['unions'][i])] ,df['subdistrict'][i]],}))

TRAIN_DATA=list(zip(l1,l2))
print(TRAIN_DATA)

Result - [('Dhansagar Bagerhat', {'entities': [[(0, 8)], 'Sarankhola']})]

My expected output - [('Dhansagar Bagerhat', {'entities': [[(0, 8)], 'Sarankhola'],[[(10, 17)], 'AnyLabel']})] How do I get this output for all the rows? I am getting the result for only one row. It seems like my loop is not working. Can anyone please point out my mistake?

My csv file looks like this. "AnyLabel" is another column. I have around 500 rows -

unions        subdistrict   district 
Dhansagar     Sarankhola    Bagerhat 
Daibagnyahati Morrelganj    Bagerhat 
Ramchandrapur Morrelganj    Bagerhat 
Kodalia       Mollahat      Bagerhat

Upvotes: 1

Views: 149

Answers (2)

U13-Forward
U13-Forward

Reputation: 71610

Try using str.join:

df=pd.read_csv("file")
l1=[]
l2=[]

for idx, row in df.iterrows():
    l1.append(' '.join((row['unions'], row['district'])))
    l2.append(({"entities": [[[ele.start(), ele.end() - 1], ele.group(0)] for ele in re.finditer(r'\S+', ' '.join([row['unions'] ,row['subdistrict']]))]}))
    

TRAIN_DATA=list(zip(l1,l2))
print(TRAIN_DATA)

Output:

[('Dhansagar Bagerhat', {'entities': [[[0, 8], 'Dhansagar'], [[10, 19], 'Sarankhola']]}), ('Daibagnyahati Bagerhat', {'entities': [[[0, 12], 'Daibagnyahati'], [[14, 23], 'Morrelganj']]}), ('Ramchandrapur Bagerhat', {'entities': [[[0, 12], 'Ramchandrapur'], [[14, 23], 'Morrelganj']]}), ('Kodalia Bagerhat', {'entities': [[[0, 6], 'Kodalia'], [[8, 15], 'Mollahat']]})]

Upvotes: 1

Ofer Sadan
Ofer Sadan

Reputation: 11942

You're using range wrong, you're basically telling it to iterate all the numbers from 0 to len(df['unions']) but to do it in steps of len(df['district']) which is the same length. So you're basically telling it to iterate across only the first row. You can see that by printing out the row numbers:

for i in range(0,len(df['unions']),len(df['district'])):
    print(i)

Also, you're not supposed to iterate over rows like that anyway, use instead df.iterrows()

df=pd.read_csv("file")
l1=[]
l2=[]

for i, row in df.iterrows():
    l1.append(' '.join((row['unions'], row['district'])))
    l2.append(({"entities": [[(ele.start(), ele.end() - 1) for ele in re.finditer(r'\S+', ' '.join([row['unions'] ,row['subdistrict']]))]]}))

Upvotes: 1

Related Questions