Blue Island
Blue Island

Reputation: 723

Pandas ValueError Arrays Must be All Same Length

Iterates over a big list of .mp3 links to get the metadata tags and save it to an Excel file. Results in this error. I appreciate any help. Thanks.

    #print is_connected();

    # Create a Pandas dataframe from the data.
df = pd.DataFrame({'Links' : lines ,'Titles' : titles , 'Singers': finalsingers , 'Albums':finalalbums , 'Years' : years})


    # Create a Pandas Excel writer using XlsxWriter as the engine.
writer = pd.ExcelWriter(xlspath, engine='xlsxwriter')

    # Convert the dataframe to an XlsxWriter Excel object.
df.to_excel(writer, sheet_name='Sheet1')
    #df.to_excel(writer, sheet_name='Sheet1')


    # Close the Pandas Excel writer and output the Excel file.
writer.save()

Traceback (most recent call last):
  File "mp.py", line 87, in <module>
    df = pd.DataFrame({'Links' : lines ,'Titles' : titles , 'Singers': finalsingers , 'Albums':finalalbums , 'Years' : years})
  File "C:\Python27\lib\site-packages\pandas\core\frame.py", line 266, in __init__
    mgr = self._init_dict(data, index, columns, dtype=dtype)
  File "C:\Python27\lib\site-packages\pandas\core\frame.py", line 402, in _init_dict
    return _arrays_to_mgr(arrays, data_names, index, columns, dtype=dtype)
  File "C:\Python27\lib\site-packages\pandas\core\frame.py", line 5409, in _arrays_to_mgr
    index = extract_index(arrays)
  File "C:\Python27\lib\site-packages\pandas\core\frame.py", line 5457, in extract_index
    raise ValueError('arrays must all be same length')
ValueError: arrays must all be same length

Upvotes: 71

Views: 271607

Answers (5)

robertspierre
robertspierre

Reputation: 4341

You can pad the shortest lists with empty elements:

def pad_dict_list(dict_list, padel):
    lmax = 0
    for lname in dict_list.keys():
        lmax = max(lmax, len(dict_list[lname]))
    for lname in dict_list.keys():
        ll = len(dict_list[lname])
        if  ll < lmax:
            dict_list[lname] += [padel] * (lmax - ll)
    return dict_list


dict_list = {'Links': [1, 2, 3], 'Titles': [1, 2, 3, 4], 'Singers': [1, 2], 'Albums': [1, 2, 3], 'Years': [1, 2, 3, 4]}
dict_list = pad_dict_list(dict_list, 0)
print(dict_list)

Output

{'Links': [1, 2, 3, 0], 'Titles': [1, 2, 3, 4], 'Singers': [1, 2, 0, 0], 'Albums': [1, 2, 3, 0], 'Years': [1, 2, 3, 4]}

Upvotes: 18

Vivek Srinivasan
Vivek Srinivasan

Reputation: 2887

you can do this to avoid that error

a = {'Links' : lines ,'Titles' : titles , 'Singers': finalsingers , 'Albums':finalalbums , 'Years' : years}
df = pd.DataFrame.from_dict(a, orient='index')
df = df.transpose()

Explanation:

This creates the DataFrame as each key (e.g. 'Links') was a row and like this the missing values are actually missing columns which is no problem for pandas (only missing rows lead to ValueError during creation) After that you transpose the DataFrame (flip the axis) and make the rows to columns, which results the DataFrame you initially wanted.

Upvotes: 160

Jaliya Sumanadasa
Jaliya Sumanadasa

Reputation: 139

I have come across the same error while reading JSON file to the pandas frame. adding linesbool, default False parameter solved the issue.

StringData = StringIO(obj.get()['Body'].read().decode('utf-8'))
                mydata = pdf.read_json(StringData, lines=True)

Upvotes: 3

cubeloid
cubeloid

Reputation: 169

Duplicate variable names caused this problem for me

Upvotes: 5

kypalmer
kypalmer

Reputation: 464

It's telling you that the arrays (lines, titles, finalsingers, etc...) are not of the same length. You can test this by

print(len(lines), len(titles), len(finalsingers)) # Print all of them out here

This will show you which data is malformed and then you'll need to do some investigating into what the right way to correct this is.

Upvotes: 14

Related Questions