Reputation: 497
My code is grabbing multiple csv files from a directory, and putting all the data into a dataFrame I created and called "df". Each CSV is the same format, but can be of various lengths so this is what I want to do:
I want to have a column in my df (DataFrame) that records the second to last piece of data in each csv I pull in before it moves onto the next one. I have modified the output below to give you an example of what I mean. Let's suppose I call this column BeforeLast. When you see a 0 value, that means its not the second to last piece of data in the csv I pulled, if you see a 1 value it means its the second to last piece of data in the csv I pulled.
How can I do this as Python is pulling in each csv called upon?
import pandas as pd
import glob
import os
path =r'X:\PublicFiles\TradingData\CSV\RealMarkets\Weekly\Futures\Contracts\Corn C'
allFiles = glob.glob(path + "/*.csv") ##'*' means any file name can be grabbed
df = pd.DataFrame()
list_ = []
for file_ in allFiles:
names = ['Date', 'Open', 'High', 'Low', 'Close', 'Vol', 'OI']
df = pd.read_csv(file_, index_col = None, names = names)
list_.append(df)
frame = pd.concat(list_)
Here is a sample of my current dataFrame (df)
Date Open High Low Close Vol OI
0 20141212 427.00 427.00 427.00 427.00 0 0
1 20141219 429.00 429.00 424.00 424.00 0 0
2 20141226 424.00 425.00 423.00 425.00 0 0
3 20150102 422.75 422.75 417.50 417.50 0 0
This is what I want
Date Open High Low Close Vol OI BeforeLast
0 20141212 427.00 427.00 427.00 427.00 0 0 0
1 20141219 429.00 429.00 424.00 424.00 0 0 0
2 20141226 424.00 425.00 423.00 425.00 0 0 1
3 20150102 422.75 422.75 417.50 417.50 0 0 0 (this is the last piece of data in this csv and now it moves on to the next)
4 20141226 424.00 425.00 423.00 425.00 0 0 0
5 20150102 422.75 422.75 417.50 417.50 0 0 0
6 20141226 424.00 425.00 423.00 425.00 0 0 1
7 20150102 422.75 422.75 417.50 417.50 0 0 0
Upvotes: 0
Views: 893
Reputation: 51
Try this. You do not need a list. Just append to the original data frame.
.iloc[-2, -1] is the 2nd to last row, last col
I added a index reset as in my test I ran into duplicate index numbers.
import pandas as pd
import glob
import os
path =r'X:\PublicFiles\TradingData\CSV\RealMarkets\Weekly\Futures\Contracts\Corn C'
allFiles = glob.glob(path + "/*.csv") ##'*' means any file name can be grabbed
df = pd.DataFrame()
for file_ in allFiles:
names = ['Date', 'Open', 'High', 'Low', 'Close', 'Vol', 'OI']
df_temp = pd.read_csv(file_, index_col = None, names = names)
df_temp['beforelast'] = 0
df_temp.iloc[-2,-1] = 1
df = df.append(df_temp)
df = df.reset_index(drop=True)
Upvotes: 2
Reputation: 829
df = pd.DataFrame({'a': np.zeros(5)})
df[-2:-1] = 1
print df
a
0 0
1 0
2 0
3 1
4 0
You can use this when you create each dataframe?
Example in your code:
for file_ in allFiles:
names = ['Date', 'Open', 'High', 'Low', 'Close', 'Vol', 'OI']
df = pd.read_csv(file_, index_col = None, names = names)
before = np.zeros(len(df))
before[-2] = 1
df['before'] = before
list_.append(df)
frame = pd.concat(list_)
Upvotes: 0
Reputation: 3221
Just create a list to keep track of the last column when you are building your dataframe:
import pandas as pd
df = pd.DataFrame()
newcol = []
for i in range(10):
# Load 10 files and get shape
# length = df.shape[0]
length = 10
c = [0 for i in range(length)]
c[-2] = 1
newcol += c
df['BeforeLast'] = newcol
print df
Upvotes: 0