Reputation: 31
I want to fill in the list_of_occurences
with the correct item from the list grundformen
.
My for-loop doesn't work as intended though. It doesn't restart from the beginning and only goes through the rows in the reader once. Therefore it won't fill the list completely.
This is what it prints (you can see the part where something is missing - because it doesn't start searching from the beginning of the list - ):
# List_of_occurrences (1 line - wrapped for easier reading)
[['NN', 1328, ('Ziel',)], ['ART', 771, ('der',)],
['$.', 732, ('_',)], ['VVFIN', 682, ('schlagen',)],
['PPER', 592, ('sie',)], ['$,', 561, ('_',)],
['ADV', 525, ('So',)], ['APPR', 507, ('in',)],
['NE', 433, ('Johanna',)], ['$(', 363, ('_',)],
['VAFIN', 334, ('haben',)], ['ADJA', 307, ('tragisch',)],
['ADJD', 278, ('recht',)], ['KON', 228, ('Doch',)],
['VVPP', 194, ('reichen',)], ['VVINF', 161, ('stören',)],
['KOUS', 151, ('Während',)], ['PPOSAT', 120, ('ihr',)],
['PTKVZ', 104, ('weiter',)], ['PRF', 98, ('sich',)],
['APPRART', 90, ('zu',)], ['PTKNEG', 87, ('nicht',)],
['VMFIN', 76, ('sollen',)], ['PIAT', 66, ('kein',)],
['PIS', 65, ('etwas',)], ['PTKZU', 52, ('zu',)],
['PRELS', 51, ('wer',)], ['PROAV', 42, ('dabei',)],
['PDS', 38, ('jener',)], ['PDAT', 37, ('dieser',)],
['PWAV', 30, ('wie',)], ['PWS', 26, ('Was',)],
['CARD', 24, ('drei',)], ['KOKOM', 21, ('wie',)],
['VAINF', 18, ('werden',)], ['KOUI', 15, ('um',)],
['VMINF', 10, ('können',)], ['VVIZU', 10, ('aufklären',)],
['VAPP', 10], ['PTKA', 6], ['PTKANT', 6], ['PWAT', 4],
['VVIMP', 4], ['PRELAT', 4], ['APZR', 3], ['APPO', 2],
['FM', 1]]
# Grundformen (1 line, wrapped for reading)
['Ziel', 'der', '_', 'schlagen', 'sie', '_', 'So', 'in', 'Johanna',
'_', 'haben', 'tragisch', 'recht', 'Doch', 'reichen', 'stören',
'Während', 'ihr', 'weiter', 'sich', 'zu', 'nicht', 'sollen', 'kein',
'etwas', 'zu', 'wer', 'dabei', 'jener', 'dieser', 'wie', 'Was',
'drei', 'wie', 'werden', 'um', 'können', 'aufklären']
occurences = collections.Counter()
with open("material-2.csv", mode='r', newline='', encoding="utf-8") as material:
reader = csv.reader(material, delimiter='\t', quotechar="\t")
for line in reader:
if line:
occurences[line[5]] += 1
else:
pass
list_of_occurences = [list(elem) for elem in occurences.most_common()]
grundformen = []
with open('material-2.csv', mode='r', newline='', encoding="utf-8") as material:
reader = csv.reader(material, delimiter='\t', quotechar="\t")
for elem in list_of_occurences:
for row in reader:
if row != [] and row[5] == elem[0]:
grundformen.append(row[2])
break
iterator = 0
for elem in grundformen:
list_of_occurences[iterator].insert(2, elem)
iterator = iterator + 1
pass
print(list_of_occurences)
print(grundformen)
whole inputfile: https://www.dropbox.com/sh/xyktjk4ycm8x6v0/AACou438_eEWx-ZYmByBiqp_a/material-2.csv?dl=0
Part of my input file:
1 Als Als _ _ KOUS _ _ 6 6 CP CP _ _ 2 es es _ _ PPER _ 3|Nom|Sg|Neut 6 6 SB SB _ _ 3 zu zu _ _ PTKA _ _ 4 4 MO MO _ _ 4 schneien schneien _ _ ADJD _ Comp|Dat|Sg|Fem 5 5 MO MO _ _ 5 aufgehört aufhören _ _ VVPP _ Psp 6 6 OC OC _ _ 6 hatte haben _ _ VAFIN _ 3|Sg|Past|Ind 8 8 MO MO _ _ 7 , _ _ _ $, _ _ 8 8 PUNC PUNC _ _ 8 verließ verlassen _ _ VVFIN _ 3|Sg|Past|Ind 0 0 ROOT ROOT _ _ 9 Johanna Johanna _ _ NE _ Nom|Sg|Masc 8 8 SB SB _ _ 10 von von _ _ APPR _ _ 5 5 SBP SBP _ _ 11 Rotenhoff Rotenhoff _ _ NE _ Dat|Sg|Neut 10 10 NK NK _ _ 12 , _ _ _ $, _ _ 8 8 PUNC PUNC _ _ 13 ohne ohne _ _ KOUI _ _ 18 18 CP CP _ _ 14 ein ein _ _ ART _ Nom|Sg|Neut 16 16 NK NK _ _ 15 rechtes recht _ _ ADJA _ Pos|Nom|Sg|Neut 16 16 NK NK _ _ 16 Ziel Ziel _ _ NN _ Nom|Sg|Neut 18 18 OA OA _ _ 17 zu zu _ _ PTKZU _ _ 18 18 PM PM _ _ 18 haben haben _ _ VAINF _ Inf 8 8 MO MO _ _ 19 , _ _ _ $, _ _ 18 18 PUNC PUNC _ _ 20 das der _ _ ART _ Nom|Sg|Neut 21 21 NK NK _ _ 21 Gutshaus Gutshaus _ _ NN _ Nom|Sg|Neut 16 16 APP APP _ _ 22 . _ _ _ $. _ _ 8 8 PUNC PUNC _ _
how can I change my loop, so that it can fill in everything?
Upvotes: 1
Views: 174
Reputation: 8322
You had an issue with how you were reading in your csv
data.
Here the data is read into a list
and can be gone through for the second loop instead of opening another file-object
but you don't even need to loop through the csv
data twice:
import csv
import collections
occurences = collections.Counter()
grundformen = collections.defaultdict(list)
with open("material-2.csv", mode='r', newline='', encoding="utf-8") as material:
reader = [ln for ln in csv.reader(material, delimiter='\t', quotechar="\t") if ln]
for line in reader:
occurences[line[5]] += 1
grundformen[line[5]].append(line[2])
list_of_occurences = list(map(list, occurences.most_common()))
for elem in list_of_occurences:
elem.append(grundformen[elem[0]][0])
print(occurences)
By making a list
out of your csv
data, you are able to call the break
statement and still be able to start a fresh at the head of the list
for your next loop. When you loop over the csv.reader
this is an iterator
so even when calling break
you will start where you left off until its data is exhausted.
Upvotes: 0
Reputation: 5440
reader = csv.reader(material, delimiter='\t', quotechar="\t")
Setting the quotechar the same as the delimiter looks rather strange. The CSV reader will probably get confused, and take either all tabs (\t
) as delimiters, or interpret them all as quotechars.
Upvotes: 1