Reputation: 472
I have a question. I would like to iterate through the folder for csv files that contain e.g. "usr666" in name and then load them into pandas dataframe only by selected column headers and merge them into one file as in the following example:
BT_usr666.csv:
number|size|person|car |
---------------------------
31 |2 |Ringo |Tesla |
82 |3 |Paul |Audi |
93 |2 |John |BMW |
74 |3 |George|MG |
RS_usr666.csv:
number|color|person|doors|car |
---------------------------------
33 |black|Mick |2 |Porsche|
12 |red |Keith |4 |Saab |
55 |blue |Ron |6 |Volvo |
into FINAL_usr666.csv
person|car |
---------------
Ringo |Tesla |
Paul |Audi |
John |BMW |
George|MG |
Mick |Porsche|
Keith |Saab |
Ron |Volvo |
Any ideas?
Upvotes: 1
Views: 88
Reputation: 2137
You can try the following script.
Code
import glob
import os
import pandas as pd
def get_final_df(files):
df = pd.DataFrame()
your_columns = ['person', 'car']
for file in files:
temp_df = pd.read_csv(file, usecols = your_columns)
df = df.append(temp_df, ignore_index=True)
return df
if __name__ == '__main__':
wd = os.getcwd() # I've set this as working dir, you can change the path to your files.
files = [file for file in glob.glob(os.path.join(wd, '*')) if 'usr666' in file]
final_df = get_final_df(files)
final_df.to_csv('final_df.csv', index=False) # Write to file
Upvotes: 1
Reputation: 2159
This could do it
This searches for the file in "." ie the current directory and finds files starting with usr666 and does what you asks for
import pandas as pd
import os
x=pd.DataFrame()
for filename in sorted(os.listdir(".")):
if filename.startswith("usr666"):
y=pd.read_csv(filename)
selected=y[["person","car"]]
x=x.append(selected)
x.to_csv('file1.csv',index=True)
Upvotes: 1