How to fix incorrect format in pandas column?

Question

I have a pandas dataframe called df1

ID| ACTIVITY   | Date
1 | activity 1 | 2/04/2016
2 | activity 2 | 3/04/2015
3 | activity 3 | 7/05/2016
3 | activity 4 | 2/04/2016
4 | activity 3 | 2/04/2017
5 | activity 6 | 2/04/2015
5 | activity 2 | 2/04/2016
6 | activity 1 | 2/04/2018

i have too create two dataframes out of these one where activity1 is being done in year 2015 and other where activity 2 is being done in year 2015 i am using the following code

for index,rows in df1.iterrows():
    if rows['Date'].year == 2015 :
      if rows['ACTIVITY'] == 'activity 1' :
          data1 = rows['ID','Date']
          answer1 = answer1.append(data1) 

      if rows['ACTIVITY'] == 'activity 2' :
          data2 = rows['ID','Date']
          answer2 = answer2.append(data2)

this is giving following result

activity1
Index  |  0
0      | Id1
1      | Date1
0      | ID2
1      | Date2
0      | ID3
1      | Date 3 corresponding to that id

similarly

activity2
Index  |  0
0      | Id1
1      | Date1
0      | ID2
1      | Date2
0      | ID3
1      | Date 3 corresponding to that id

what i want is

   activity1
    ID  | Date
    ID1 | Date1

How to do this?

jezrael · Accepted Answer

Use DataFrame.loc for filter by conditions and columns names:

mask = (df['Date'].dt.year == 2015)
df1 = df.loc[mask & (df['ACTIVITY'] == 'activity 1'), ['ID','Date']]

df2 = df.loc[mask & (df['ACTIVITY'] == 'activity 2'), ['ID','Date']]

Your solution is not recommend, check this.

But is possible by:

df1['Date'] = pd.to_datetime(df1['Date'])

answer1, answer2 = pd.DataFrame(), pd.DataFrame()
for index,rows in df1.iterrows():
    if rows['Date'].year == 2015 :
      if rows['activity'] == 'activity 1' :
          data1 = rows[['ID','Date']]
          answer1 = answer1.append(data1) 

      if rows['activity'] == 'activity 2' :
          data2 = rows[['ID','Date']]
          answer2 = answer2.append(data2)

How to fix incorrect format in pandas column?

Answers (1)

Related Questions