user4974662
user4974662

Reputation:

Python Pandas Dataframe Append Rows

I'm trying to append the data frame values as rows but its appending them as columns. I have 32 files that i would like to take the second column from (called dataset_code) and append it. But its creating 32 rows and 101 columns. I would like 1 column and 3232 rows.

import pandas as pd
import os



source_directory = r'file_path'

df_combined = pd.DataFrame(columns=["dataset_code"])

for file in os.listdir(source_directory):
    if file.endswith(".csv"):
            #Read the new CSV to a dataframe.  
            df = pd.read_csv(source_directory + '\\' + file)
            df = df["dataset_code"]
            df_combined=df_combined.append(df)



print(df_combined)

Upvotes: 1

Views: 4503

Answers (3)

Alicia Garcia-Raboso
Alicia Garcia-Raboso

Reputation: 13913

You already have two perfectly good answers, but let me make a couple of recommendations.

  1. If you only want the dataset_code column, tell pd.read_csv directly (usecols=['dataset_code']) instead of loading the whole file into memory only to subset the dataframe immediately.
  2. Instead of appending to an initially-empty dataframe, collect a list of dataframes and concatenate them in one fell swoop at the end. Appending rows to a pandas DataFrame is costly (it has to create a whole new one), so your approach creates 65 DataFrames: one at the beginning, one when reading each file, one when appending each of the latter — maybe even 32 more, with the subsetting. The approach I am proposing only creates 33 of them, and is the common idiom for this kind of importing.

Here is the code:

import os
import pandas as pd

source_directory = r'file_path'

dfs = []
for file in os.listdir(source_directory):
    if file.endswith(".csv"):
        df = pd.read_csv(os.join.path(source_directory, file),
                        usecols=['dataset_code'])
        dfs.append(df)

df_combined = pd.concat(dfs)

Upvotes: 7

Parfait
Parfait

Reputation: 107567

Alternatively, you can create a dataframe with double square brackets:

df = df[["dataset_code"]]

Upvotes: 3

Nehal J Wani
Nehal J Wani

Reputation: 16629

df["dataset_code"] is a Series, not a DataFrame. Since you want to append one DataFrame to another, you need to change the Series object to a DataFrame object.

>>> type(df)
<class 'pandas.core.frame.DataFrame'>
>>> type(df['dataset_code'])
<class 'pandas.core.series.Series'>

To make the conversion, do this:

df = df["dataset_code"].to_frame()

Upvotes: 3

Related Questions