zaplec
zaplec

Reputation: 1819

How to merge single data column from multiple csv files into one with Pandas?

I'm trying to merge a single data column from 40 almost similar csv files with Pandas. The files contains info from windows processes in csv form generated by Windows 'Tasklist' command.

What I want to do is, to merge the memory information from these files into a single file by using the PID as the key. However there are some random insignificant processes that appear every now and then, but cause inconsistency among the csv files. Meaning that in some file there might be 65 rows and in some files 75 rows. However those random processes are not significant and their changing PID should not matter and they should also be dropped off when merging the files.

This is how I first tried to do it:

# CSV files have following columns
# Image Name, PID, Session Name, Session #, Mem Usage

file1 = pd.read_csv("tasklist1.txt")
file1 = file1.drop(file1.columns[[2,3]], axis=1)    

for i in range(2,41):

    filename = "tasklist" + str(i) + ".txt"

    filei = pd.read_csv(filename)
    filei = filei.drop(filei.columns[[0,2,3]], axis=1)

    file1 = file1.merge(filei, on='PID')


file1.to_csv("Final.txt", index=False)

From the first csv file I just drop the Session Name and Session # columns, but keep the Image Names just as the titles for each row. Then from the following csv files I just keep the PID and Mem Usage columns and try to merge the previous all the time growing csv file with the data from upcoming file.

The problem here is that when the loop comes to 5th iteration, it cannot merge the files anymore as I get the "Reindexing only valid with uniquely valued Index objects" error.

So I can merge 1st file with 2nd to 4th inside the first loop. If I then create second loop where I merge the 5th file to 6th to 8th file and then merge these two merged files together, all the data from files 1 to 8 will be merged just perfectly fine.

Any suggestion how to perform this kind of chained merge without creating x amount of additional loops? At this point I'm experimenting with 40 files and could actually go through the whole process by brute force this with nested loops, but that isn't effective way of merging in the first place and unacceptable, if I need to scale this to merge even more files.

Upvotes: 1

Views: 818

Answers (1)

jezrael
jezrael

Reputation: 862511

Duplicate column names will cause this error.

So you can add parameter suffixes in function merge:

suffixes : 2-length sequence (tuple, list, ...)

Suffix to apply to overlapping column names in the left and right side, respectively

Overlapping value columns.

Upvotes: 1

Related Questions