vesuvius
vesuvius

Reputation: 435

Automate the process of comparing the values of 2 csv files if value matches read the second csv into the DataFrame

I have imported an excel into a dataframe. It looks like this:

enter image description here

Then I used a code to groupby this data as per 'tx_id' and create separate csv with the name of tx_id which gives me data like this (3e6737ae-c3af-4d19-a645-d17fc73dbb7c.csv). This is the code:

for i, g in dframe.groupby('tx_id'):
     g.to_csv('{}.csv'.format(i.split('/')[0]), index=False)

Then I created a separate dframe containing only the tx_id and then dropped the duplicates using this code:

dframe1 = dframe1.drop_duplicates()

Now my dataframe looks like this:

enter image description here

I have converted this dataframe into csv. Now I want to compare the names of the csv file(which is the tx_id value) with the data present in the newly created csv and if the names match , I would like to read the csv file(which is the tx_id value) into the dataframe. I used to import these csv files manually but I have a large dataset , it's not feasible for me to read the data each time and do further process on it. Right now what I am doing is importing the csv files individually into a dataframe. I am using this code:

df = pd.read_csv(' ae229a81-bb33-4cf1-ba2f-360fffb0d94b.csv')

This gives me the result like:

enter image description here

Then I used to unstack it and apply value_counts by using this code:

df1 = df.groupby('rule_id')['request_id'].value_counts().unstack().fillna(0)

And the end result used to look like this:

enter image description here

I want to automate this process and I don't know how. Can you guys help me?

Upvotes: 3

Views: 105

Answers (1)

Chris
Chris

Reputation: 29742

You can iterate your tx_id and append the dataframes to list:

import pandas as pd

dfs = []
for tx in dframe1['tx_id']:
    dfs.append(pd.read_csv('%s.csv' % tx))

This only works if it's executed in the same directory as the csv files. Otherwise:

import os
import pandas

dfs = []

for tx in dframe1['tx_id']:
    dfs.append(pd.read_csv(os.path.join('/path/to/csv/', '%s.csv' % tx)))

Editted

Instead of appending dataframe directly, if you want to apply some functions:

for tx in dframe1['tx_id']:
    df = pd.read_csv(os.path.join('/path/to/csv/', '%s.csv' % tx))
    dfs.append(df.groupby('rule_id')['request_id'].value_counts().unstack().fillna(0))

Now your dfs has all the value_counts() results. You can refer them using indices.

If you want to find them using the filename, use dict:

df_dict = dict()
for tx in dframe1['tx_id']:
    df = pd.read_csv(os.path.join('/path/to/csv/', '%s.csv' % tx))
    df_dict[tx] = df.groupby('rule_id')['request_id'].value_counts().unstack().fillna(0)

Upvotes: 1

Related Questions