Reputation: 139
My code -
df=pd.read_csv("file")
l1=[]
l2=[]
for i in range(0,len(df['unions']),len(df['district'])):
l1.append(' '.join((df['unions'][i], df['district'][i])))
l2.append(({"entities": [[(ele.start(), ele.end() - 1) for ele in re.finditer(r'\S+', df['unions'][i])] ,df['subdistrict'][i]],}))
TRAIN_DATA=list(zip(l1,l2))
print(TRAIN_DATA)
Result - [('Dhansagar Bagerhat', {'entities': [[(0, 8)], 'Sarankhola']})]
My expected output - [('Dhansagar Bagerhat', {'entities': [[(0, 8)], 'Sarankhola'],[[(10, 17)], 'AnyLabel']})]
How do I get this output for all the rows? I am getting the result for only one row. It seems like my loop is not working. Can anyone please point out my mistake?
My csv file looks like this. "AnyLabel" is another column. I have around 500 rows -
unions subdistrict district
Dhansagar Sarankhola Bagerhat
Daibagnyahati Morrelganj Bagerhat
Ramchandrapur Morrelganj Bagerhat
Kodalia Mollahat Bagerhat
Upvotes: 1
Views: 149
Reputation: 71610
Try using str.join
:
df=pd.read_csv("file")
l1=[]
l2=[]
for idx, row in df.iterrows():
l1.append(' '.join((row['unions'], row['district'])))
l2.append(({"entities": [[[ele.start(), ele.end() - 1], ele.group(0)] for ele in re.finditer(r'\S+', ' '.join([row['unions'] ,row['subdistrict']]))]}))
TRAIN_DATA=list(zip(l1,l2))
print(TRAIN_DATA)
Output:
[('Dhansagar Bagerhat', {'entities': [[[0, 8], 'Dhansagar'], [[10, 19], 'Sarankhola']]}), ('Daibagnyahati Bagerhat', {'entities': [[[0, 12], 'Daibagnyahati'], [[14, 23], 'Morrelganj']]}), ('Ramchandrapur Bagerhat', {'entities': [[[0, 12], 'Ramchandrapur'], [[14, 23], 'Morrelganj']]}), ('Kodalia Bagerhat', {'entities': [[[0, 6], 'Kodalia'], [[8, 15], 'Mollahat']]})]
Upvotes: 1
Reputation: 11942
You're using range
wrong, you're basically telling it to iterate all the numbers from 0 to len(df['unions'])
but to do it in steps of len(df['district'])
which is the same length. So you're basically telling it to iterate across only the first row. You can see that by printing out the row numbers:
for i in range(0,len(df['unions']),len(df['district'])):
print(i)
Also, you're not supposed to iterate over rows like that anyway, use instead df.iterrows()
df=pd.read_csv("file")
l1=[]
l2=[]
for i, row in df.iterrows():
l1.append(' '.join((row['unions'], row['district'])))
l2.append(({"entities": [[(ele.start(), ele.end() - 1) for ele in re.finditer(r'\S+', ' '.join([row['unions'] ,row['subdistrict']]))]]}))
Upvotes: 1