Reputation: 2001
I am using the below code. All my CSV files have uniform structure. When a dataframe is formed, it contains two columns for date in my CSV.
In the resulting dataframe, for few rows date value is in first date column, while for rest of the data, it goes to second date column.
Any idea, why two columns (Date columns), are getting generated for one column in the source CSV files.
all_data = pd.DataFrame()
for f in glob.glob("/Users/tcssig/Desktop/Files/*.csv"):
df = pd.read_csv(f)
all_data = all_data.append(df,ignore_index=True)
In [76]: all_data.columns
Out[76]: Index(['0', '0.1', 'Channel_ID', 'Date', 'Date ', 'Duration (HH:MM)','Episode #', 'Image', 'Language', 'Master House ID', 'Parental Rating','Program Category', 'Program Title', 'StartTime_ET', 'StartTime_ET2','Synopsis'],
dtype='object')
Upvotes: 1
Views: 88
Reputation: 394389
because you have a space in the second column:
'Date', 'Date '
^
so you need to normalise the columns prior to appending
all_data = pd.DataFrame()
for f in glob.glob("/Users/tcssig/Desktop/Files/*.csv"):
df = pd.read_csv(f)
df.columns = df.columns.str.strip()
all_data = all_data.append(df,ignore_index=True)
here I use str.strip
to remove any leading and trailing whitespace
Upvotes: 5