Merging CSV Using pandas dataframe

Question

I am using the below code. All my CSV files have uniform structure. When a dataframe is formed, it contains two columns for date in my CSV.

In the resulting dataframe, for few rows date value is in first date column, while for rest of the data, it goes to second date column.

Any idea, why two columns (Date columns), are getting generated for one column in the source CSV files.

all_data = pd.DataFrame()
for f in glob.glob("/Users/tcssig/Desktop/Files/*.csv"):
    df = pd.read_csv(f)
    all_data = all_data.append(df,ignore_index=True)

In [76]: all_data.columns
Out[76]: Index(['0', '0.1', 'Channel_ID', 'Date', 'Date ', 'Duration (HH:MM)','Episode #', 'Image', 'Language', 'Master House ID', 'Parental Rating','Program Category', 'Program Title', 'StartTime_ET', 'StartTime_ET2','Synopsis'],
 dtype='object')

EdChum · Accepted Answer

because you have a space in the second column:

'Date', 'Date '
             ^

so you need to normalise the columns prior to appending

all_data = pd.DataFrame()
for f in glob.glob("/Users/tcssig/Desktop/Files/*.csv"):
    df = pd.read_csv(f)
    df.columns = df.columns.str.strip()
    all_data = all_data.append(df,ignore_index=True)

here I use str.strip to remove any leading and trailing whitespace

Merging CSV Using pandas dataframe

Answers (1)

Related Questions