Reputation:
I'm trying to append the data frame values as rows but its appending them as columns. I have 32 files that i would like to take the second column from (called dataset_code) and append it. But its creating 32 rows and 101 columns. I would like 1 column and 3232 rows.
import pandas as pd
import os
source_directory = r'file_path'
df_combined = pd.DataFrame(columns=["dataset_code"])
for file in os.listdir(source_directory):
if file.endswith(".csv"):
#Read the new CSV to a dataframe.
df = pd.read_csv(source_directory + '\\' + file)
df = df["dataset_code"]
df_combined=df_combined.append(df)
print(df_combined)
Upvotes: 1
Views: 4503
Reputation: 13913
You already have two perfectly good answers, but let me make a couple of recommendations.
dataset_code
column, tell pd.read_csv
directly (usecols=['dataset_code']
) instead of loading the whole file into memory only to subset the dataframe immediately.DataFrame
is costly (it has to create a whole new one), so your approach creates 65 DataFrame
s: one at the beginning, one when reading each file, one when appending each of the latter — maybe even 32 more, with the subsetting. The approach I am proposing only creates 33 of them, and is the common idiom for this kind of importing.Here is the code:
import os
import pandas as pd
source_directory = r'file_path'
dfs = []
for file in os.listdir(source_directory):
if file.endswith(".csv"):
df = pd.read_csv(os.join.path(source_directory, file),
usecols=['dataset_code'])
dfs.append(df)
df_combined = pd.concat(dfs)
Upvotes: 7
Reputation: 107567
Alternatively, you can create a dataframe with double square brackets:
df = df[["dataset_code"]]
Upvotes: 3
Reputation: 16629
df["dataset_code"]
is a Series
, not a DataFrame
. Since you want to append one DataFrame to another, you need to change the Series object to a DataFrame object.
>>> type(df)
<class 'pandas.core.frame.DataFrame'>
>>> type(df['dataset_code'])
<class 'pandas.core.series.Series'>
To make the conversion, do this:
df = df["dataset_code"].to_frame()
Upvotes: 3