Reputation:
Appreciate any help on this one. I have 7 CSV files (all the same format) that that I have concatenated into one frame. My goal here is to compare two columns from the CSV's and find out how many times the word "Done" from the "Ran" column show up on each Date from the "Date" column. So far this is what I have written:
path = r'C:\Users\rock\Desktop\workspace\MTS_subs'
all_files = glob.glob(path + "/*.csv")
li = []
for filename in all_files:
df = pd.read_csv(filename, index_col=None, header=0)
li.append(df)
frame = pd.concat(li, axis=0, ignore_index=True)
counter = frame['Ran'].value_counts()
date_counter = frame['Date'].value_counts()
print(counter, date_counter)
this prints out the following:
Active 1739
Done 840
Name: Ran, dtype: int64 18/06/2020 402
19/06/2020 300
17/06/2020 266
25/06/2020 264
22/06/2020 224
16/06/2020 214
23/06/2020 208
24/06/2020 208
26/06/2020 184
15/06/2020 180
21/06/2020 76
14/06/2020 46
20/06/2020 4
13/06/2020 3
Name: Date, dtype: int64
So in all 7 CSVs, the word "Done" appears 840 times but I would like to find out how many times "Done" appears on each of those dates.
I've been scratching my head at this one for sometime. Any help or input is very much appreciated.
Upvotes: 1
Views: 73
Reputation: 2163
(frame['Ran'] == 'Done').groupby(frame['Date']).sum()
should do the trick. Below is an example that simulates the screenshot that was posted.
>>> frame = pd.DataFrame({
... 'Date': ['13/06/2020']*3 + ['15/06/2020']*2 + ['14/06/2020']*12,
... 'Ran': ['Done']*17
... })
>>> frame
Date Ran
0 13/06/2020 Done
1 13/06/2020 Done
2 13/06/2020 Done
3 15/06/2020 Done
4 15/06/2020 Done
5 14/06/2020 Done
6 14/06/2020 Done
7 14/06/2020 Done
8 14/06/2020 Done
9 14/06/2020 Done
10 14/06/2020 Done
11 14/06/2020 Done
12 14/06/2020 Done
13 14/06/2020 Done
14 14/06/2020 Done
15 14/06/2020 Done
16 14/06/2020 Done
>>> (frame['Ran'] == 'Done').groupby(frame['Date']).sum()
Date
13/06/2020 3.0
14/06/2020 12.0
15/06/2020 2.0
Name: Ran, dtype: float64
Upvotes: 1