Reputation: 1819
I'm trying to merge a single data column from 40 almost similar csv files with Pandas. The files contains info from windows processes in csv form generated by Windows 'Tasklist' command.
What I want to do is, to merge the memory information from these files into a single file by using the PID as the key. However there are some random insignificant processes that appear every now and then, but cause inconsistency among the csv files. Meaning that in some file there might be 65 rows and in some files 75 rows. However those random processes are not significant and their changing PID should not matter and they should also be dropped off when merging the files.
This is how I first tried to do it:
# CSV files have following columns
# Image Name, PID, Session Name, Session #, Mem Usage
file1 = pd.read_csv("tasklist1.txt")
file1 = file1.drop(file1.columns[[2,3]], axis=1)
for i in range(2,41):
filename = "tasklist" + str(i) + ".txt"
filei = pd.read_csv(filename)
filei = filei.drop(filei.columns[[0,2,3]], axis=1)
file1 = file1.merge(filei, on='PID')
file1.to_csv("Final.txt", index=False)
From the first csv file I just drop the Session Name and Session # columns, but keep the Image Names just as the titles for each row. Then from the following csv files I just keep the PID and Mem Usage columns and try to merge the previous all the time growing csv file with the data from upcoming file.
The problem here is that when the loop comes to 5th iteration, it cannot merge the files anymore as I get the "Reindexing only valid with uniquely valued Index objects" error.
So I can merge 1st file with 2nd to 4th inside the first loop. If I then create second loop where I merge the 5th file to 6th to 8th file and then merge these two merged files together, all the data from files 1 to 8 will be merged just perfectly fine.
Any suggestion how to perform this kind of chained merge without creating x amount of additional loops? At this point I'm experimenting with 40 files and could actually go through the whole process by brute force this with nested loops, but that isn't effective way of merging in the first place and unacceptable, if I need to scale this to merge even more files.
Upvotes: 1
Views: 818