Reputation: 903
I have multiple files (up to 20 or more) to merge based on certain criteria. Three file examples are below:
File1
ID Type Condition Colour
113884 M Good Green
123456 M Good Blue
178101 U Good Yellow
245645 U Good Red
256178 X Fair Green
803989 X Poor Red
File2
ID Type Condition Colour
113884 M Good Green
123456 M Good Blue
172221 M Poor Red
178101 U Good Yellow
256178 X Fair Green
File 3
ID Type Condition Colour
113884 M Good Green
123456 M Good Blue
172221 M Poor Red
178101 U Good Yellow
178101 U Good Yellow
256178 X Fair Green
286762 Q Good Purple
I want to extract and merge similar information in these files as in the required output below. In all files any row with 'Type' M and Q are extracted along with their IDs, and so in the output file the file names then become columns of fields indicating 'yes' or 'no' if the ID and Type were contained in the file or not.
Required output (.csv):
ID Type File 1 File2 File3
113884 M Yes Yes Yes
123456 M Yes Yes Yes
172221 M No Yes Yes
286762 Q No No Yes
Here's my insufficient attempt:
import os, glob
all_line =[]
for file in golob.glob('*.txt'):
infile = open('file', 'r')
for line in file:
line=line.strip.split('\t')
if line[1]=='M' or line[1]=='Q':
all_line.append(line)
I'm not sure how to do this using python or pandas. Can anyone help? Thanks.
Upvotes: 1
Views: 181
Reputation: 210842
IIUC you can do it this way:
import os
import glob
import pandas as pd
files = glob.glob(r'D:\temp\.data\File*.csv')
def merge_files(files, **kwargs):
dfs = []
for f in files:
dfs.append(
pd.read_csv(f, delim_whitespace=True, usecols=['ID','Type'])
.query("Type in ['M','Q']")
.drop_duplicates()
.assign(col=0)
.rename(columns={'col':os.path.splitext(os.path.basename(f))[0]})
.set_index(['ID','Type'])
)
return pd.concat(dfs, axis=1).notnull()
result = merge_files(files).reset_index()
print(result)
Output:
ID Type File1 File2 File3
0 113884 M True True True
1 123456 M True True True
2 172221 M False True True
3 286762 Q False False True
Upvotes: 4