user13820515
user13820515

Reputation:

Comparing Data from two columns in CSV files

Appreciate any help on this one. I have 7 CSV files (all the same format) that that I have concatenated into one frame. My goal here is to compare two columns from the CSV's and find out how many times the word "Done" from the "Ran" column show up on each Date from the "Date" column. So far this is what I have written:

path = r'C:\Users\rock\Desktop\workspace\MTS_subs'          
all_files = glob.glob(path + "/*.csv")

li = []

for filename in all_files:
    df = pd.read_csv(filename, index_col=None, header=0)
    li.append(df)

frame = pd.concat(li, axis=0, ignore_index=True)

counter = frame['Ran'].value_counts()
date_counter = frame['Date'].value_counts()


print(counter, date_counter)

this prints out the following:

Active    1739
Done       840
Name: Ran, dtype: int64 18/06/2020    402
19/06/2020    300
17/06/2020    266
25/06/2020    264
22/06/2020    224
16/06/2020    214
23/06/2020    208
24/06/2020    208
26/06/2020    184
15/06/2020    180
21/06/2020     76
14/06/2020     46
20/06/2020      4
13/06/2020      3
Name: Date, dtype: int64

So in all 7 CSVs, the word "Done" appears 840 times but I would like to find out how many times "Done" appears on each of those dates.

I've been scratching my head at this one for sometime. Any help or input is very much appreciated.

CSV Image

Upvotes: 1

Views: 73

Answers (1)

Roméo Després
Roméo Després

Reputation: 2163

(frame['Ran'] == 'Done').groupby(frame['Date']).sum() should do the trick. Below is an example that simulates the screenshot that was posted.

>>> frame = pd.DataFrame({
...     'Date': ['13/06/2020']*3 + ['15/06/2020']*2 + ['14/06/2020']*12,
...     'Ran': ['Done']*17
... })
>>> frame
          Date   Ran
0   13/06/2020  Done
1   13/06/2020  Done
2   13/06/2020  Done
3   15/06/2020  Done
4   15/06/2020  Done
5   14/06/2020  Done
6   14/06/2020  Done
7   14/06/2020  Done
8   14/06/2020  Done
9   14/06/2020  Done
10  14/06/2020  Done
11  14/06/2020  Done
12  14/06/2020  Done
13  14/06/2020  Done
14  14/06/2020  Done
15  14/06/2020  Done
16  14/06/2020  Done
>>> (frame['Ran'] == 'Done').groupby(frame['Date']).sum()
Date
13/06/2020     3.0
14/06/2020    12.0
15/06/2020     2.0
Name: Ran, dtype: float64

Upvotes: 1

Related Questions