McM
McM

Reputation: 471

How to build a dataframe row by row, where each row comes from a different csv?

I have searched through perhaps a dozen variations of the question "How to build a dataframe row by row", but none of the solutions have worked for me. Thus, though this is a frequently asked question, my case is unique enough to be a valid question. I think the problem might be that I am grabbing each row from a different csv. This code demonstrates that I am successfully making dataframes in the loop:

onlyfiles = list_of_csvs 
for idx, f in enumerate(onlyfiles):
    row = pd.read_csv(mypath + f,sep="|").iloc[0:1]

But the rows are individual dataframes and cannot be combined (so far). I have attempted the following:

df = pd.DataFrame()
for idx, f in enumerate(onlyfiles):
    row = pd.read_csv(path + f,sep="|").iloc[0:1]
    df.iloc(idx) = row

Which returns

    df.loc(idx) = row
    ^
SyntaxError: can't assign to function call

I think the problem is that each row, or dataframe, has its own headers. I've also tried df.loc(idx) = row[1] but that doesn't work either (where we grab row[:] when idx = 0). Neither iloc(idx) or loc(idx) works.

In the end, I want one dataframe that has the header (column names) from the first data frame, and then n rows where n is the number of files.

Upvotes: 1

Views: 266

Answers (1)

yulGM
yulGM

Reputation: 1094

Try pd.concat().

Note, you can read just the first line from the file directly, instead of reading in the file and then limiting to first row. pass parameter nrows=1 in pd.read_csv.

onlyfiles = list_of_csvs 
df_joint = pd.DataFrame()
for f in enumerate(onlyfiles):
    df_ = pd.read_csv(mypath + f,sep="|", nrows=1)
    df_joint = pd.concat([df_joint, df_])

Upvotes: 1

Related Questions