Reputation: 77
Hello all i'm new to pandas,
I have a very inconvenient set of csv data like this:
PMSN01001_PFT0_20181212_Crop_AGE.jpg_OCR.csv
PMSN01001_PFT0_20181212_Crop_GENDER.jpg_OCR.csv
PMSN01001_PFT0_20181212_Crop_HEIGHT.jpg_OCR.csv
PMSN01001_PFT0_20181212_Crop_WEIGHT.jpg_OCR.csv
...
PMSN01002_PFT0_20181212_Crop_AGE.jpg_OCR.csv
PMSN01002_PFT0_20181212_Crop_GENDER.jpg_OCR.csv
PMSN01002_PFT0_20181212_Crop_HEIGHT.jpg_OCR.csv
PMSN01002_PFT0_20181212_Crop_WEIGHT.jpg_OCR.csv
...
each csv files has a simple, single string or number like the following:
PMSN01001*AGE*.csv 54
PMSN01001*GENDER*.csv male
PMSN01001*HEIGHT*.csv 171
PMSN01001*WEIGHT*.csv 65.4
PMSN01002*AGE*.csv 45
PMSN01002*GENDER*.csv female
PMSN01002*HEIGHT*.csv 141
PMSN01002*WEIGHT*.csv 87
Basically, I want to make the entire dataframe look like this where filename appear as rows and values as columns accordingly:
age gender height weight
PMSN*.csv 54 male 171 65.4
PMSN*.csv 52 female 181 54
how can I achieve this?
I think the key idea is to make an empty dataframe then use glob.glob() to read all csv files and somehow use open() to treat each filenames as rows. But I'm going in circles here...
csv_path = \mypath\
filenames = glob.glob(csv_path + '\*.csv')
many thanks in advance!!
Upvotes: 0
Views: 58
Reputation: 446
You can use the below code where header=0 means that the first row can be assigned as the column names after reading CSV.
import pandas as pd
import glob
path = r'C:\DRO\DCL_rawdata_files' # use your path
all_files = glob.glob(path + "/*.csv")
li = []
for filename in all_files:
df = pd.read_csv(filename, index_col=None, header=0)
li.append(df)
frame = pd.concat(li, axis=0, ignore_index=True)
If You Want Merge data in row then use it
csv_file_list = ["sample1.csv", "sample2.csv"]
list_of_dataframes = []
for filename in csv_file_list:
list_of_dataframes.append(pd.read_csv(filename))
merged_df = pd.concat(list_of_dataframes)
print(merged_df)
Upvotes: 1