Reputation: 435
I have imported an excel into a dataframe. It looks like this:
Then I used a code to groupby this data as per 'tx_id' and create separate csv with the name of tx_id which gives me data like this (3e6737ae-c3af-4d19-a645-d17fc73dbb7c.csv). This is the code:
for i, g in dframe.groupby('tx_id'):
g.to_csv('{}.csv'.format(i.split('/')[0]), index=False)
Then I created a separate dframe containing only the tx_id and then dropped the duplicates using this code:
dframe1 = dframe1.drop_duplicates()
Now my dataframe looks like this:
I have converted this dataframe into csv. Now I want to compare the names of the csv file(which is the tx_id value) with the data present in the newly created csv and if the names match , I would like to read the csv file(which is the tx_id value) into the dataframe. I used to import these csv files manually but I have a large dataset , it's not feasible for me to read the data each time and do further process on it. Right now what I am doing is importing the csv files individually into a dataframe. I am using this code:
df = pd.read_csv(' ae229a81-bb33-4cf1-ba2f-360fffb0d94b.csv')
This gives me the result like:
Then I used to unstack it and apply value_counts by using this code:
df1 = df.groupby('rule_id')['request_id'].value_counts().unstack().fillna(0)
And the end result used to look like this:
I want to automate this process and I don't know how. Can you guys help me?
Upvotes: 3
Views: 105
Reputation: 29742
You can iterate your tx_id
and append the dataframes to list
:
import pandas as pd
dfs = []
for tx in dframe1['tx_id']:
dfs.append(pd.read_csv('%s.csv' % tx))
This only works if it's executed in the same directory as the csv files. Otherwise:
import os
import pandas
dfs = []
for tx in dframe1['tx_id']:
dfs.append(pd.read_csv(os.path.join('/path/to/csv/', '%s.csv' % tx)))
Editted
Instead of appending dataframe directly, if you want to apply some functions:
for tx in dframe1['tx_id']:
df = pd.read_csv(os.path.join('/path/to/csv/', '%s.csv' % tx))
dfs.append(df.groupby('rule_id')['request_id'].value_counts().unstack().fillna(0))
Now your dfs
has all the value_counts()
results. You can refer them using indices.
If you want to find them using the filename, use dict
:
df_dict = dict()
for tx in dframe1['tx_id']:
df = pd.read_csv(os.path.join('/path/to/csv/', '%s.csv' % tx))
df_dict[tx] = df.groupby('rule_id')['request_id'].value_counts().unstack().fillna(0)
Upvotes: 1