Reputation: 943
I am trying to manipulate a pandas df so I can count the frequency of occurrences of a data point in one column after a specific occurrence another column. My psuedo code below probably sums it up best. Any help would be much appreciated!
import datetime
import time
import pandas as pd
# Set number of rows to skip
rows_to_skip = 0
# Rows to use after skipped rows
rows_to_use = 10000
# Read the file (Adjust arguments accordingly)
data = pd.read_csv('example.csv',skiprows=rows_to_skip, error_bad_lines=False, nrows=rows_to_use, low_memory=False)
# Add headers when skipping rows
data.columns = ["X","Y","Z"]
# Psuedo Code Below
for variable in data['X']:
if variable > 0:
# Count number of times the following conditions are met in all subsequent rows:
condition 1 ) Y > 0
condition 2 ) Z <= Z of the row where variable was > 0
# Then I want to add the total count to a new column, and have it in the same row as X when the "variable" > 0.
Any help?
Upvotes: 0
Views: 2470
Reputation: 42875
Assuming you want the count for the cases between every instance where X>0
, as opposed to the count for the remainder of the entire DataFrame
after each X>0
:
You could create a new column
that indicates where the X>0
condition holds True
, .fillna(method='ffill')
and .groupby()
on the result. Then you only need to .apply()
to get the len()
of the group
where the other conditions are True
.
Some sample data:
df = pd.DataFrame(data=np.random.randint(-10, 10, size=(100, 3)), columns=list('XYZ'))
X Y Z
0 -3 6 -7
1 -4 -10 -1
2 9 -10 -9
3 5 0 -8
4 -2 1 -8
Proceed as follows:
df['condition'] = df.index.to_series().where(df.X > 0).fillna(method='ffill')
df['count'] = df.groupby('condition').apply(lambda x: len(x[(x.Y>0) & (x.Z > x.Z.iloc[0])]))
to get:
X Y Z condition count
0 -3 6 -7 NaN NaN
1 -4 -10 -1 NaN NaN
2 9 -10 -9 2.0 0.0
3 5 0 -8 3.0 0.0
4 -2 1 -8 3.0 NaN
5 6 -6 -3 5.0 1.0
6 0 6 3 5.0 NaN
7 -6 -7 -6 5.0 NaN
8 7 -2 -5 8.0 0.0
9 0 -1 5 8.0 NaN
10 5 8 -3 10.0 0.0
11 -2 -2 1 10.0 NaN
12 3 4 2 12.0 1.0
13 -5 1 -9 12.0 NaN
14 -7 2 6 12.0 NaN
15 1 -10 6 15.0 0.0
16 1 -8 6 16.0 0.0
17 -4 -9 -8 16.0 NaN
18 -9 4 6 16.0 NaN
19 5 -6 2 19.0 0.0
20 5 7 -1 20.0 0.0
21 2 -2 -3 21.0 0.0
22 -6 -10 -2 21.0 NaN
23 -7 -9 3 21.0 NaN
24 -8 7 -8 21.0 NaN
25 3 -3 6 25.0 0.0
26 1 -6 -3 26.0 1.0
27 -4 6 -1 26.0 NaN
28 6 -4 9 28.0 0.0
29 -8 2 1 28.0 NaN
.. .. .. .. ... ...
70 -5 7 -6 68.0 NaN
71 6 6 -7 71.0 1.0
72 -3 0 3 71.0 NaN
73 -5 3 2 71.0 NaN
74 -6 -8 8 71.0 NaN
75 1 0 -4 75.0 0.0
76 7 -9 -5 76.0 0.0
77 1 0 -1 77.0 0.0
78 5 9 -2 78.0 0.0
79 -8 -9 -6 78.0 NaN
80 2 -3 3 80.0 3.0
81 -7 -5 8 80.0 NaN
82 -4 -5 -7 80.0 NaN
83 -3 5 -6 80.0 NaN
84 -5 1 4 80.0 NaN
85 -1 6 7 80.0 NaN
86 -7 4 4 80.0 NaN
87 -7 -4 -1 80.0 NaN
88 -2 -8 2 80.0 NaN
89 4 6 4 89.0 0.0
90 4 -10 -8 90.0 0.0
91 -7 -9 5 90.0 NaN
92 5 3 -1 92.0 0.0
93 6 6 6 93.0 0.0
94 9 -2 0 94.0 1.0
95 -1 5 5 94.0 NaN
96 2 8 -9 96.0 2.0
97 -6 7 -4 96.0 NaN
98 -1 7 -8 96.0 NaN
99 -4 0 -1 96.0 NaN
Upvotes: 1