Reputation: 1063
I have a large dataframe of transactions which I want to break into two smaller dataframes based on a certain column ("Type"). If "Type" is "S" then add that entire row to the "cust_sell" dataframe, and if "Type" is "P" to the "cust_buy" dataframe. I am using a for loop, but this is only adding the index value to the dataframe. Any help is appreciated!
from win32com.shell import shell, shellcon
import pandas as pd
filename = (shell.SHGetFolderPath(0, shellcon.CSIDL_PERSONAL, None, 0)) + '\MSRB T-1_test.xlsx'
wb = pd.read_excel(filename, sheet_name='T1-20062017', index_col=0, header=0)
cust_buy = []
cust_sell = []
# Create a list of customer buys and sells separately
for i in wb.index:
if wb['Type'][i] == 'S':
cust_sell.append([i])
elif wb['Type'][i] == 'P':
cust_buy.append([i])
Upvotes: 8
Views: 66811
Reputation: 71
Like @trollster said, it is indeed better to create dataframes for cust_sell and cust_buy. But let's understand what is not working with your code. When you do:
for i in wb.index
it means i will take the values of wb.index. And when you print wb.index, you get:
Int64Index([0, 1, 2, 3, 4, 5, 6, 7], dtype='int64')
Meaning i will take the values 1,2,3 ... So when you do:
cust_sell.append([i])
what you are doing is adding to the list_cust_sell a list with inside a single element, i (an integer). If you want to add the entire row, you should use:
cust_sell.append(list(wb.loc[i,:]))
You will end up with a list of lists, each one containing a row
Upvotes: 3
Reputation: 164673
Using dict
+ groupby
you can create a dictionary of dataframes. This solution does not require you to manually specify all unique types and is more easily extendable than a manual loop.
Data from @trollster.
res = dict(tuple(mainDf.groupby('Type')))
{'P': Type Dummy
3 P 4
4 P 5
6 P 7,
'S': Type Dummy
0 S 1
1 S 2
2 S 3
5 S 6
7 S 8}
Upvotes: 2
Reputation: 6639
You do not need to write loops. You can do it easily with pandas.
Assuming your dataframe looks like this:
import pandas as pd
mainDf = pd.DataFrame()
mainDf['Type'] = ['S', 'S', 'S', 'P', 'P', 'S', 'P', 'S']
mainDf['Dummy'] = [1, 2, 3, 4, 5, 6, 7, 8]
To create dataframe for S and P types, you can just do this:
cust_sell = mainDf[mainDf.Type == 'S']
cust_buy = mainDf[mainDf.Type == 'P']
cust_sell output:
Type Dummy
0 S 1
1 S 2
2 S 3
5 S 6
7 S 8
cust_buy output:
Type Dummy
3 P 4
4 P 5
6 P 7
Upvotes: 29