Reputation: 197
I have the following dataframe in pandas read from a .csv file.
Month Date Year Obs_score
0 Jan 24,25,26 2015 -1.75
1 Mar 2 2015 -2.0
2 Apr 3 2015 -0.5
I need to unpivot the Date column so that I get the following.
Month Date Year Obs_score
0 Jan 24 2015 -1.75
1 Jan 25 2015 -1.75
2 Jan 26 2015 -1.75
3 Mar 2 2015 -2.0
4 Apr 3 2015 -0.5
Basically make the Date field uniform and have a single value per row and replicate the Obs_scores for the comma separated dates. Is there a simple way to do this? The order of the indices are irrelevant.
Upvotes: 2
Views: 1038
Reputation: 8391
A possible solution is based on pandas merge method to create databases on the basis of relations.
Case test:
# (just two columns for ease)
df = pd.DataFrame({'Date' : ['24,25,26','2','3']
, 'score' : [-1.75,-2.0,-0.5] })
i.e.
Date score
0 24,25,26 -1.75
1 2 -2.00
2 3 -0.50
1: produce the "split" column (the assumption that the entries are comma separated values in strings)
b = df['Date'].apply(lambda x : pd.Series(x.split(","))).stack()
0 0 24
1 25
2 26
1 0 2
2 0 3
2: reset the indices and merge on the base of the proper labels
b_reset = b.reset_index()
level_0 level_1 0
0 0 0 24
1 0 1 25
2 0 2 26
3 1 0 2
4 2 0 3
df_reset = df.reset_index()
index Date score
0 0 24,25,26 -1.75
1 1 2 -2.00
2 2 3 -0.50
It's clear that one has to merge b_reset
and df_reset
in a relation one-to-many, linking level_0
of b_reset
and index
of df_reset
:
df_temp = pd.merge(b_reset
, df_reset
, left_on = 'level_0'
, right_on = 'index' )
3: we finally keep just the useful columns
df_t[['score',0]]
score 0
0 -1.75 24
1 -1.75 25
2 -1.75 26
3 -2.00 2
4 -0.50 3
Upvotes: 1