Reputation: 723
Iterates over a big list of .mp3 links to get the metadata tags and save it to an Excel file. Results in this error. I appreciate any help. Thanks.
#print is_connected();
# Create a Pandas dataframe from the data.
df = pd.DataFrame({'Links' : lines ,'Titles' : titles , 'Singers': finalsingers , 'Albums':finalalbums , 'Years' : years})
# Create a Pandas Excel writer using XlsxWriter as the engine.
writer = pd.ExcelWriter(xlspath, engine='xlsxwriter')
# Convert the dataframe to an XlsxWriter Excel object.
df.to_excel(writer, sheet_name='Sheet1')
#df.to_excel(writer, sheet_name='Sheet1')
# Close the Pandas Excel writer and output the Excel file.
writer.save()
Traceback (most recent call last):
File "mp.py", line 87, in <module>
df = pd.DataFrame({'Links' : lines ,'Titles' : titles , 'Singers': finalsingers , 'Albums':finalalbums , 'Years' : years})
File "C:\Python27\lib\site-packages\pandas\core\frame.py", line 266, in __init__
mgr = self._init_dict(data, index, columns, dtype=dtype)
File "C:\Python27\lib\site-packages\pandas\core\frame.py", line 402, in _init_dict
return _arrays_to_mgr(arrays, data_names, index, columns, dtype=dtype)
File "C:\Python27\lib\site-packages\pandas\core\frame.py", line 5409, in _arrays_to_mgr
index = extract_index(arrays)
File "C:\Python27\lib\site-packages\pandas\core\frame.py", line 5457, in extract_index
raise ValueError('arrays must all be same length')
ValueError: arrays must all be same length
Upvotes: 71
Views: 271607
Reputation: 4341
You can pad the shortest lists with empty elements:
def pad_dict_list(dict_list, padel):
lmax = 0
for lname in dict_list.keys():
lmax = max(lmax, len(dict_list[lname]))
for lname in dict_list.keys():
ll = len(dict_list[lname])
if ll < lmax:
dict_list[lname] += [padel] * (lmax - ll)
return dict_list
dict_list = {'Links': [1, 2, 3], 'Titles': [1, 2, 3, 4], 'Singers': [1, 2], 'Albums': [1, 2, 3], 'Years': [1, 2, 3, 4]}
dict_list = pad_dict_list(dict_list, 0)
print(dict_list)
Output
{'Links': [1, 2, 3, 0], 'Titles': [1, 2, 3, 4], 'Singers': [1, 2, 0, 0], 'Albums': [1, 2, 3, 0], 'Years': [1, 2, 3, 4]}
Upvotes: 18
Reputation: 2887
you can do this to avoid that error
a = {'Links' : lines ,'Titles' : titles , 'Singers': finalsingers , 'Albums':finalalbums , 'Years' : years}
df = pd.DataFrame.from_dict(a, orient='index')
df = df.transpose()
Explanation:
This creates the DataFrame as each key (e.g. 'Links'
) was a row and like this the missing values are actually missing columns which is no problem for pandas (only missing rows lead to ValueError
during creation) After that you transpose the DataFrame (flip the axis) and make the rows to columns, which results the DataFrame you initially wanted.
Upvotes: 160
Reputation: 139
I have come across the same error while reading JSON file to the pandas frame. adding linesbool, default False parameter solved the issue.
StringData = StringIO(obj.get()['Body'].read().decode('utf-8'))
mydata = pdf.read_json(StringData, lines=True)
Upvotes: 3
Reputation: 464
It's telling you that the arrays (lines, titles, finalsingers, etc...) are not of the same length. You can test this by
print(len(lines), len(titles), len(finalsingers)) # Print all of them out here
This will show you which data is malformed and then you'll need to do some investigating into what the right way to correct this is.
Upvotes: 14