Blue Moon
Blue Moon

Reputation: 4651

How to link operations between multiple .txt files in python?

An image will be useful to understand this problem: enter image description here

the .txt file on the left hand side are customers order history. A column in the text file represents the order id number (order_idno). On the right hand side we have the transaction data of a company. This operates on various sites and the period of time is divided over several weeks.

.txt files are stored in a folder. Each individual file is named i.e. "By_Customer_Id_01120.txt"and it looks like:

enter image description here

We are interested in the first column(order_idno).

Looking at the data from the excel file, once imported in a pandas DataFrame it looks like

df = df[['sitename', 'order_idno', 'ts_placed']]

Out[7]: 
                              sitename  order_idno  ts_placed
0            www.mattressesworld.co.uk     5242367         47
1      www.bedroomfurnitureworld.co.uk     5242295         47
2      www.bedroomfurnitureworld.co.uk     5242295         47
3      www.bedroomfurnitureworld.co.uk     5242295         47
4      www.bedroomfurnitureworld.co.uk     5242295         47
.....

where 'ts_placed' is the week number.

What i want to do is to find for each costumer(represented by each .txt file) how many times they have order on each site and in what week.

Feel free to edit the question if this needs more details or the title is to generic.

Upvotes: 2

Views: 116

Answers (1)

timakro
timakro

Reputation: 1841

This code is not tested, it is likely to include syntax errors. It's just for you to get an idea of how to get started. It's also not efficient because I don't know exactly what information you want to get. I'm iteration over the excel sheet multiple times, if you really want to get the information of all customers everytime, it would be more efficient to read all customers into memory before iterating over the excel sheet.

First some modules we will use.

import re

You already managed to read the excel file so I will start with the loop that iterates over the customers. You can use the os.listdir function to do so.

customer_folder = "path/to/customer/folder"
for file in os.listdir(customer_folder):

Now we have a loop that iterates over the customer files. Let's print which customer we are looking at at the momemnt and check if the file name is of the expected format.

    match = re.match("By_Customer_Id_([0-9]+)\.txt", file)
    if match:
        print("Looking at customer id '{0}':".format(match.group[1]))

Now we have to open the file, so we can make a list of all order id numbers of a customer.

        with open(os.path.join(customer_folder, file) as customer:
            customer_idnos = [order.split(",")[0] for order in customer]

Let's iterate over the elements in the excel document so we can get information from them. We first create a dict where we save our informations later.

            sites = {}
            for _, row in df.iterrows():
                if row["order_idno"] in customer_idnos:

If the ordner id number we are looking at is in the order id numbers of the customer we are looking at we get our information.

                    if not row["sitename"] in sites:
                        sites[row["sitename"]] = []
                    sites[row["sitename"]].append(row["ts_placed"])

Let's print our informations.

            for site, weeks in sites:
                print("    Site '{0}' visited {1} time(s):".format(
                                 site,        len(weeks)))
                for week in weeks:
                    print("        In week {0}".format(week))

Upvotes: 3

Related Questions