Reputation: 3029
I have two files which look exactly the same: file1
1 in seattle today the secretary of education richard riley delivered his address
1 one of the things he focused on as the president had done
1 abc's michele norris has been investigating this
2 we're going to take a closer look tonight at the difficulty of getting meaningful
file2
1 in seattl today the secretari of educ richard riley deliv hi address
1 one of the thing he focus on a the presid had done
1 abc michel norri ha been investig thi
2 we'r go to take a closer look tonight at the difficulti of get meaning
When I run this code:
result=defaultdict(list)
with open("onthis.txt","r") as filer:
for line in filer:
label, sentence= line.strip().split(' ', 1)
result[label].append(sentence)
It works perfectly for file1 but gives me a value error for file2:
label, sentence= line.strip().split(' ', 1)
ValueError: need more than 1 value to unpack
I don't seem to catch the reason when they are both in the same format. So, I just removed the empty lines by this terminal command:
sed '/^$/d' onthis.txt > trial
But the same error appears.
Upvotes: 0
Views: 93
Reputation: 2688
Based on your edit I suspect you might still have "empty" lines in your text file. Well I probably better should say: lines filled with nothing but white spaces.
I've extended your example file:
1 in seattl today the secretari of educ richard riley deliv hi address
1 one of the thing he focus on a the presid had done
1 abc michel norri ha been investig thi
2 we'r go to take a closer look tonight at the difficulti of get meaning
3 foo
4 bar
5 qun
It's probably not clear but the line between 3 foo
and 4 bar
is filled by a couple of white spaces while the lines between 4 bar
5 qun
are "just" new lines (\n
).
Notice the output of sed '/^$/d'
1 in seattl today the secretari of educ richard riley deliv hi address
1 one of the thing he focus on a the presid had done
1 abc michel norri ha been investig thi
2 we'r go to take a closer look tonight at the difficulti of get meaning
3 foo
4 bar
5 qun
The empty lines are truly removed - no doubt. But the pseudo-empty white space lines is still there. Running your python script will throw an error when reaching this line:
2 we'r go to take a closer look tonight at the difficulti of get meaning
3 foo
Traceback (most recent call last):
File "python.py", line 9, in <module>
label, sentence= line.strip().split(' ', 1)
ValueError: need more than 1 value to unpack
So my suggestion would be to extend your script by one line, making it skip empty lines in your input file.
for line in filer:
if not line.strip(): continue
Doing so has the positive side effect you don't have to prepare your input files with some sed
-magic before.
Upvotes: 1
Reputation: 612
Based on the above that you have provided (with a tweak). This seems to give the expected result.
result = {}
with open("test.txt", "r") as filer:
for line in filer:
label, sentence = line.strip().split(' ', 1)
try:
result[label].append(sentence)
except KeyError:
result[label] = [sentence]
Output:
{'2': ["we'r go to take a closer look tonight at the difficulti of get meaning"], '1': ['in seattl today the secretari of educ richard riley deliv hi address', 'one of the thing he focus on a the presid had done', 'abc michel norri ha been investig thi']}
So this must mean that we there is something missing from what you have provided. I think that if the above doesn't give you what you need then more info is required
Upvotes: -1
Reputation: 5252
They can't be exactly the same. My guess is that there is an empty / white-space-only line somewhere in your second file, most likely right at the end.
The error is telling you that when it is performing the split, there are no spaces to split on so only one value is being returned, rather than a value for both label
and sentence
.
Upvotes: 1