js352
js352

Reputation: 374

Python multiprocessing error when passing a list

I am trying to use multiprocessing library to speed up CSV reading from files. I've done so using Pool and now I'm trying to do it with Process(). However when running the code, it's giving me the following error:

AttributeError: 'tuple' object has no attribute 'join'

Can someone tell me what's wrong? I don't understand the error.

import glob
import pandas as pd
from multiprocessing import Process
import matplotlib.pyplot as plt
import os

location = "/home/data/csv/"

uber_data = []

def read_csv(filename):

    return uber_data.append(pd.read_csv(filename))

def data_wrangling(uber_data):
    uber_data['Date/Time'] = pd.to_datetime(uber_data['Date/Time'], format="%m/%d/%Y %H:%M:%S")
    uber_data['Dia Setmana'] = uber_data['Date/Time'].dt.weekday_name
    uber_data['Num dia'] = uber_data['Date/Time'].dt.dayofweek

    return uber_data

def plotting(uber_data):

    weekdays = uber_data.pivot_table(index=['Num dia','Dia Setmana'], values='Base', aggfunc='count')
    weekdays.plot(kind='bar', figsize=(8,6))
    plt.ylabel('Total Journeys')
    plt.title('Journey on Week Day')

def main():

    processes = []
    files = list(glob.glob(os.path.join(location,'*.csv*')))

    for i in files:
        p = Process(target=read_csv, args=[i])
        processes.append(p)
        p.start()

    for process in enumerate(processes):
        process.join()


    #combined_df = pd.concat(df_list, ignore_index=True)
    #dades_mod = data_wrangling(combined_df)
    #plotting(dades_mod)

main()

Thank you.

Upvotes: 0

Views: 213

Answers (1)

Neil
Neil

Reputation: 3281

I'm not 100% sure how Process works in this context, but what you have written here:

for process in enumerate(processes):
    process.join()

will obviously throw an error and you can see this just from knowing builtins. Calling enumerate on any iterable will produce a tuple where the first element is a counter.

Try this for a start:

for i, process in enumerate(processes): # assign the counter to the variable i, and grab the process which is the second element of the tuple
    process.join()

Or this:

for process in processes:
    process.join()

For more on enumerate see the builtin documentation here: https://docs.python.org/3/library/functions.html#enumerate

Upvotes: 1

Related Questions