Reputation: 3136
I have a csv file like this:
year month Company A Company B Company C
1990 Jan 10 15 20
1990 Feb 11 14 21
1990 mar 13 8 23
1990 april 12 22 19
1990 may 15 12 18
1990 june 18 13 13
1990 june 12 14 15
1990 july 12 14 16
1991 Jan 11 16 13
1991 Feb 14 17 11
1991 mar 23 13 12
1991 april 23 21 10
1991 may 22 22 9
1991 june 24 20 32
1991 june 12 14 15
1991 july 21 14 16
1992 Jan 10 13 26
1992 Feb 9 11 19
1992 mar 23 12 18
1992 april 12 10 21
1992 may 17 9 10
1992 june 15 42 9
1992 june 16 9 26
1992 july 15 26 19
1993 Jan 18 19 20
1993 Feb 19 18 21
1993 mar 20 21 23
1993 april 21 10 19
1993 may 13 9 14
1993 june 14 23 23
1993 june 15 21 23
1993 july 16 10 22
I want to find out for each company the month and year where they had the highest number of sale for ex: for company A
in year 1990 they had highest sale of 18. I want to do this using pandas. but to understand how to proceed with this. pointers needed please.
ps: here is what I have done till now.
import pandas as pd
df = pd.read_csv('SAMPLE.csv')
num_of_rows = len(df.index)
years_list = []
months_list = []
company_list = df.columns[2:]
for each in df.columns[2:]:
each = []
for i in range(0,num_of_rows):
years_list.append(df[df.columns[0]][i])
months_list.append(df[df.columns[1]][i])
years_list = list(set(years_list))
months_list = list(set(months_list))
for each in years_list:
for c in company_list:
print df[(df.year == each)][c].max()
I am getting the biggest number for a year for a company but how to get the month and year also I dont know.
Upvotes: 0
Views: 4474
Reputation: 394041
Use a combination of idxmax()
and loc
to filter the dataframe:
In [36]:
import pandas as pd
import io
temp = """year month Company_A Company_B Company_C
1990 Jan 10 15 20
1990 Feb 11 14 21
1990 mar 13 8 23
1990 april 12 22 19
1990 may 15 12 18
1990 june 18 13 13
1990 june 12 14 15
1990 july 12 14 16
1991 Jan 11 16 13
1991 Feb 14 17 11
1991 mar 23 13 12
1991 april 23 21 10
1991 may 22 22 9
1991 june 24 20 32
1991 june 12 14 15
1991 july 21 14 16
1992 Jan 10 13 26
1992 Feb 9 11 19
1992 mar 23 12 18
1992 april 12 10 21
1992 may 17 9 10
1992 june 15 42 9
1992 june 16 9 26
1992 july 15 26 19
1993 Jan 18 19 20
1993 Feb 19 18 21
1993 mar 20 21 23
1993 april 21 10 19
1993 may 13 9 14
1993 june 14 23 23
1993 june 15 21 23
1993 july 16 10 22"""
df = pd.read_csv(io.StringIO(temp),sep='\s+')
# the call to .unique() is because the same row for A and C appears twice
df.loc[df[['Company_A', 'Company_B', 'Company_C']].idxmax().unique()]
Out[36]:
year month Company_A Company_B Company_C
13 1991 june 24 20 32
21 1992 june 15 42 9
Upvotes: 1