How to make different processes write in a common list and dataframe?

Question

I have more than 1000 files which I would like to open and write the number of columns each file has into another dataframe. To speed up the process, I would like to use multiprocessing feature. Here is the code that I have written

import pandas as pd
import datetime
import os
import multiprocessing

all_files = os.listdir('E:\2nd Set\')
def cal(files,final_list):
    print(files)
    df = pd.read_csv('E:\'+files)
    number_columns = df.shape[0]
    final_list.extend([files,number_columns])
    main_df.loc[main_df.shape[0]] = final_list


if __name__=='__main__':
    mgr = multiprocessing.Manager()
    main_list = mgr.list()
    p1 = multiprocessing.Pool()
    p = p1.map(cal,all_files,main_list)
    p1.start()
    p1.join()

On the execution of the above code, I am getting this error

TypeError: '<=' not supported between instances of 'ListProxy' and 'int'

Also how to use a common dataframe

How to make different processes write in a common list and dataframe?

Answers (1)

Related Questions