abhiraj gupta
abhiraj gupta

Reputation: 33

How to make different processes write in a common list and dataframe?

I have more than 1000 files which I would like to open and write the number of columns each file has into another dataframe. To speed up the process, I would like to use multiprocessing feature. Here is the code that I have written

import pandas as pd
import datetime
import os
import multiprocessing

all_files = os.listdir('E:\\2nd Set\\')
def cal(files,final_list):
    print(files)
    df = pd.read_csv('E:\\'+files)
    number_columns = df.shape[0]
    final_list.extend([files,number_columns])
    main_df.loc[main_df.shape[0]] = final_list


if __name__=='__main__':
    mgr = multiprocessing.Manager()
    main_list = mgr.list()
    p1 = multiprocessing.Pool()
    p = p1.map(cal,all_files,main_list)
    p1.start()
    p1.join()

On the execution of the above code, I am getting this error

TypeError: '<=' not supported between instances of 'ListProxy' and 'int'

Also how to use a common dataframe

Upvotes: 0

Views: 87

Answers (1)

Adon Bilivit
Adon Bilivit

Reputation: 26998

Lots of issues here not least of which is the third parameter to map() which should be an int (chunk size). That's what's causing your problem

Upvotes: 1

Related Questions