How to assign column with string values to dataframe

Question

I have a csv file with multiple columns of which I want to return these 4 columns. There exist duplicates in the Case_ID column with a different start and or completion time. The goal is to return the case_id with the min start_time and max completion_time and the corresponding casetype.

Case_ID         Start_time   Completion_time   Casetype 

BZ-06-0501-001  2006-10-24  2007-06-15         'string'
BZ-06-0507-001  2006-10-30  2007-06-18         'string'
BZ-06-0508-001  2006-10-30  2008-09-08         'string'
BZ-06-0509-001  2006-10-30  2008-09-08         'string'

This is the code I use

def f():
    appeals = pd.read_csv('appeals')
    min_max = appeals[['Case_ID','Start_time','Completion_time']].groupby('Case_ID').agg({'Start_time' : ['min'], 'Completion_time' : ['max']}).assign(Casetype = appeals['Casetype'])
    return min_max

The problem is that the casetype column returns as nan. So the column the values in the column are not seen as a string anymore. Did I use assign() wrongly?

XavierBrt · Accepted Answer

One possible solution:

min_max = appeals.groupby('Case_ID').agg({'Start_time' : 'min', 'Completion_time' : 'max', 'Casetype': 'first'}).reset_index()

I removed your selection of columns ([['Case_ID','Start_time','Completion_time']]) in order to keep all the columns, including Casetype. Then, I added Casetype in the aggregation, with the selection of the first occurence, as you said all Casetype are the same for a given Case_ID. Then, I removed the .assign().

How to assign column with string values to dataframe

Answers (1)

Related Questions