Won Chul Chung
Won Chul Chung

Reputation: 77

Pandas: Ideas on how to design dataframe and append multiple csv files

Hello all i'm new to pandas,

I have a very inconvenient set of csv data like this:

PMSN01001_PFT0_20181212_Crop_AGE.jpg_OCR.csv
PMSN01001_PFT0_20181212_Crop_GENDER.jpg_OCR.csv
PMSN01001_PFT0_20181212_Crop_HEIGHT.jpg_OCR.csv
PMSN01001_PFT0_20181212_Crop_WEIGHT.jpg_OCR.csv
...
PMSN01002_PFT0_20181212_Crop_AGE.jpg_OCR.csv
PMSN01002_PFT0_20181212_Crop_GENDER.jpg_OCR.csv
PMSN01002_PFT0_20181212_Crop_HEIGHT.jpg_OCR.csv
PMSN01002_PFT0_20181212_Crop_WEIGHT.jpg_OCR.csv
...

each csv files has a simple, single string or number like the following:

PMSN01001*AGE*.csv       54
PMSN01001*GENDER*.csv    male
PMSN01001*HEIGHT*.csv    171
PMSN01001*WEIGHT*.csv    65.4
PMSN01002*AGE*.csv       45
PMSN01002*GENDER*.csv    female
PMSN01002*HEIGHT*.csv    141
PMSN01002*WEIGHT*.csv    87

Basically, I want to make the entire dataframe look like this where filename appear as rows and values as columns accordingly:

                age    gender  height  weight      
PMSN*.csv        54     male    171    65.4
PMSN*.csv        52     female  181    54

how can I achieve this?

I think the key idea is to make an empty dataframe then use glob.glob() to read all csv files and somehow use open() to treat each filenames as rows. But I'm going in circles here...

csv_path = \mypath\  

filenames = glob.glob(csv_path + '\*.csv')

many thanks in advance!!

Upvotes: 0

Views: 58

Answers (1)

Cognisun Inc
Cognisun Inc

Reputation: 446

You can use the below code where header=0 means that the first row can be assigned as the column names after reading CSV.

import pandas as pd
import glob

path = r'C:\DRO\DCL_rawdata_files' # use your path

all_files = glob.glob(path + "/*.csv")

li = []

for filename in all_files:

    df = pd.read_csv(filename, index_col=None, header=0)

    li.append(df)

frame = pd.concat(li, axis=0, ignore_index=True)

If You Want Merge data in row then use it

csv_file_list = ["sample1.csv", "sample2.csv"]

list_of_dataframes = []
for filename in csv_file_list:
    list_of_dataframes.append(pd.read_csv(filename))

merged_df = pd.concat(list_of_dataframes)

print(merged_df)

Upvotes: 1

Related Questions