Reputation: 4651
An image will be useful to understand this problem:
the .txt file on the left hand side are customers order history. A column in the text file represents the order id number (order_idno). On the right hand side we have the transaction data of a company. This operates on various sites and the period of time is divided over several weeks.
.txt files are stored in a folder. Each individual file is named i.e. "By_Customer_Id_01120.txt"and it looks like:
We are interested in the first column(order_idno).
Looking at the data from the excel file, once imported in a pandas DataFrame it looks like
df = df[['sitename', 'order_idno', 'ts_placed']]
Out[7]:
sitename order_idno ts_placed
0 www.mattressesworld.co.uk 5242367 47
1 www.bedroomfurnitureworld.co.uk 5242295 47
2 www.bedroomfurnitureworld.co.uk 5242295 47
3 www.bedroomfurnitureworld.co.uk 5242295 47
4 www.bedroomfurnitureworld.co.uk 5242295 47
.....
where 'ts_placed' is the week number.
What i want to do is to find for each costumer(represented by each .txt file) how many times they have order on each site and in what week.
Feel free to edit the question if this needs more details or the title is to generic.
Upvotes: 2
Views: 116
Reputation: 1841
This code is not tested, it is likely to include syntax errors. It's just for you to get an idea of how to get started. It's also not efficient because I don't know exactly what information you want to get. I'm iteration over the excel sheet multiple times, if you really want to get the information of all customers everytime, it would be more efficient to read all customers into memory before iterating over the excel sheet.
First some modules we will use.
import re
You already managed to read the excel file so I will start with the loop that iterates over the customers. You can use the os.listdir function to do so.
customer_folder = "path/to/customer/folder"
for file in os.listdir(customer_folder):
Now we have a loop that iterates over the customer files. Let's print which customer we are looking at at the momemnt and check if the file name is of the expected format.
match = re.match("By_Customer_Id_([0-9]+)\.txt", file)
if match:
print("Looking at customer id '{0}':".format(match.group[1]))
Now we have to open the file, so we can make a list of all order id numbers of a customer.
with open(os.path.join(customer_folder, file) as customer:
customer_idnos = [order.split(",")[0] for order in customer]
Let's iterate over the elements in the excel document so we can get information from them. We first create a dict where we save our informations later.
sites = {}
for _, row in df.iterrows():
if row["order_idno"] in customer_idnos:
If the ordner id number we are looking at is in the order id numbers of the customer we are looking at we get our information.
if not row["sitename"] in sites:
sites[row["sitename"]] = []
sites[row["sitename"]].append(row["ts_placed"])
Let's print our informations.
for site, weeks in sites:
print(" Site '{0}' visited {1} time(s):".format(
site, len(weeks)))
for week in weeks:
print(" In week {0}".format(week))
Upvotes: 3