Reputation: 37
I will be iterating through a large list of dataframes of baseball statistics of different players. This data is indexed by year. What I am looking to do is group year while keeping salary the same and adding WAR. Also, I am looking to drop rows that are not single years. In my data set these entries are strings.
for x in clean_stats_list:
x.groupby("Year")
for x in clean_stats_list:
for i in x['Year']:
if len(i) > 4:
x['Year'][i].drop()
WAR Year Salary
0 1.4 2008 $390,000
1 0.9 2009 $418,000
2 2.4 2010 $445,000
3 3.6 2011 $3,400,000
4 5.2 2012 $5,400,000
5 1.3 2013 $7,400,000
6 6.8 2014 $10,000,000
7 3.8 2015 $10,000,000
9 0.2 2015 $10,000,000
11 5.5 2016 $15,833,333
12 2.0 2017 $21,833,333
13 1.3 2018 $21,833,333
14 34.3 11 Seasons $96,952,999
16 25.4 CIN (8 yrs) $37,453,000
17 8.8 SFG (3 yrs) $59,499,999
This is what I am expecting to achieve:
WAR Year Salary
0 1.4 2008 $390,000
1 0.9 2009 $418,000
2 2.4 2010 $445,000
3 3.6 2011 $3,400,000
4 5.2 2012 $5,400,000
5 1.3 2013 $7,400,000
6 6.8 2014 $10,000,000
7 4.0 2015 $10,000,000
11 5.5 2016 $15,833,333
12 2.0 2017 $21,833,333
13 1.3 2018 $21,833,333
Upvotes: 1
Views: 47
Reputation: 863801
You can use regex for validate years for avoid filter values with length 4
and not years with Series.str.contains
and boolean indexing
:
#https://stackoverflow.com/a/4374209
#validate between 1000-2999
df1 = df[df['Year'].str.contains('^[12][0-9]{3}$')]
#validate between 0000-9999
#df1 = df[df['Year'].str.contains('^\d{4}$')]
print (df1)
WAR Year Salary
0 1.4 2008 $390,000
1 0.9 2009 $418,000
2 2.4 2010 $445,000
3 3.6 2011 $3,400,000
4 5.2 2012 $5,400,000
5 1.3 2013 $7,400,000
6 6.8 2014 $10,000,000
7 3.8 2015 $10,000,000
9 0.2 2015 $10,000,000
11 5.5 2016 $15,833,333
12 2.0 2017 $21,833,333
13 1.3 2018 $21,833,333
Upvotes: 0
Reputation: 1092
To filter out based on length of column Year
, why don't you try creating a mask and then select based on it.
Code:
mask_df = your_df['Year'].str.len() == 4
your_df_cleaned = your_df.loc[mask_df]
Upvotes: 1