Sarang Manjrekar
Sarang Manjrekar

Reputation: 2001

Merging CSV Using pandas dataframe

I am using the below code. All my CSV files have uniform structure. When a dataframe is formed, it contains two columns for date in my CSV.

In the resulting dataframe, for few rows date value is in first date column, while for rest of the data, it goes to second date column.

Any idea, why two columns (Date columns), are getting generated for one column in the source CSV files.

all_data = pd.DataFrame()
for f in glob.glob("/Users/tcssig/Desktop/Files/*.csv"):
    df = pd.read_csv(f)
    all_data = all_data.append(df,ignore_index=True)

In [76]: all_data.columns
Out[76]: Index(['0', '0.1', 'Channel_ID', 'Date', 'Date ', 'Duration (HH:MM)','Episode #', 'Image', 'Language', 'Master House ID', 'Parental Rating','Program Category', 'Program Title', 'StartTime_ET', 'StartTime_ET2','Synopsis'],
 dtype='object')

Upvotes: 1

Views: 88

Answers (1)

EdChum
EdChum

Reputation: 394389

because you have a space in the second column:

'Date', 'Date '
             ^

so you need to normalise the columns prior to appending

all_data = pd.DataFrame()
for f in glob.glob("/Users/tcssig/Desktop/Files/*.csv"):
    df = pd.read_csv(f)
    df.columns = df.columns.str.strip()
    all_data = all_data.append(df,ignore_index=True)

here I use str.strip to remove any leading and trailing whitespace

Upvotes: 5

Related Questions