Reputation: 497
I have a question in regards to using pd.read_csv I am currently building a dataframe from multiple csv files from a folder and the csv files are named as follows: "C2__1979H" or "C2_1999Z"
I would like to set the index of my dataFrame to equal the name of the CSV file it is currently pulling to create my dataframe. I have yet to find a way to do that. Here is my current code
my dataframe looks like this:
Date Open High Low Close Vol OI Roll
0 19780106 236.00 237.50 234.50 235.50 0 0 0
1 19780113 235.50 239.00 235.00 238.25 0 0 0
2 19780120 238.00 239.00 234.50 237.00 0 0 0
3 19780127 237.00 238.50 235.50 236.00 0 0 0
I want it to look like this
Date Open High Low Close Vol OI Roll
C2__1979N 19780106 236.00 237.50 234.50 235.50 0 0 0
C2__1979N 19780113 235.50 239.00 235.00 238.25 0 0 0
C2__1979N 19780120 238.00 239.00 234.50 237.00 0 0 0
C2__1979Z 19780127 237.00 238.50 235.50 236.00 0 0 0 ##(assuming this is where the next csv file began)
Upvotes: 2
Views: 5905
Reputation: 21888
It does the trick.
import os
df_temp = pd.DataFrame({'Close': [235.5, 238.25, 237.0, 236.0],
'Date': [19780106, 19780113, 19780120, 19780127],
'High': [237.5, 239.0, 239.0, 238.5],
'Low': [234.5, 235.0, 234.5, 235.5],
'OI': [0, 0, 0, 0],
'Open': [236.0, 235.5, 238.0, 237.0],
'Roll': [0, 0, 0, 0],
'Vol': [0, 0, 0, 0]})
df = pd.DataFrame()
# To simulate several df
x=0
for file_ in ['the_path/C2__1979N.csv', 'other_path/C2__1979H.csv']:
filename, file_extension = os.path.splitext(file_)
df_temp['name'] = os.path.basename(filename)
df = df.append(df_temp.loc[x:x+1,:])
x+=1
df.set_index('name', inplace=True)
df.index.name = None
print(df)
# Result
Close Date High Low OI Open Roll Vol
C2__1979N 235.50 19780106 237.5 234.5 0 236.0 0 0
C2__1979N 238.25 19780113 239.0 235.0 0 235.5 0 0
C2__1979H 237.00 19780120 239.0 234.5 0 238.0 0 0
C2__1979H 236.00 19780127 238.5 235.5 0 237.0 0 0
In the original code:
for file_ in allFiles:
names = ['Date', 'Open', 'High', 'Low', 'Close', 'Vol', 'OI', 'Roll']
df_temp = pd.read_csv(file_, index_col = None, names = names)
df_temp['Roll'] = 0
df_temp.iloc[-2,-1] = 1
filename, file_extension = os.path.splitext(file_)
df_temp['name'] = os.path.basename(filename)
df = df.append(df_temp)
df = df.reset_index(drop=True)
df.set_index('name', inplace=True)
df.index.name = None
df = df[names]
df = df.drop_duplicates('Date') ## remove duplicate rows with same date
Upvotes: 2
Reputation: 5896
Have you tried the obvious one?
df_temp.index = [file_]*len(df_temp)
Upvotes: 0