Reputation: 6897
I have a pandas DataFrame
that looks like the following:
bus_uid bus_type type obj_uid \
0 biomass: DEB31 biomass output Simple_139804698384200
0 biomass: DEB31 biomass other duals
0 biomass: DEB31 biomass other excess
datetime \
0 DatetimeIndex(['2015-01-01 00:00:00', '2015-01-01 01:00:00', '2015-01-01 02:00:00', ...
0 DatetimeIndex(['2015-01-01 00:00:00', '2015-01-01 01:00:00', '2015-01-01 02:00:00', ...
0 DatetimeIndex(['2015-01-01 00:00:00', '2015-01-01 01:00:00', '2015-01-01 02:00:00', ...
values
0 [1.0, 2.0, 3.0, ...
0 [4.0, 5.0, 6.0, ...
0 [7.0, 8.0, 9.0, ...
And want to convert it into the following format:
bus_uid bus_type type obj_uid datetime values
0 biomass: DEB31 biomass output Simple_139804698384200 2015-01-01 00:00:00 1.0
0 biomass: DEB31 biomass output Simple_139804698384200 2015-01-01 01:00:00 2.0
0 biomass: DEB31 biomass output Simple_139804698384200 2015-01-01 02:00:00 3.0
0 biomass: DEB31 biomass other duals 2015-01-01 00:00:00 4.0
0 biomass: DEB31 biomass other duals 2015-01-01 01:00:00 5.0
0 biomass: DEB31 biomass other duals 2015-01-01 02:00:00 6.0
0 biomass: DEB31 biomass other excess 2015-01-01 00:00:00 7.0
0 biomass: DEB31 biomass other excess 2015-01-01 01:00:00 8.0
0 biomass: DEB31 biomass other excess 2015-01-01 02:00:00 9.0
The columns datetime
and values
have the same dimension.
I have already asked a similar question here but couldn't manage to apply the solution for my problem with two columns.
What's the best way to convert the DataFrame
into the required format?
Upvotes: 4
Views: 1752
Reputation: 42875
You could iterate through the rows to extract the Index
and Series
info from the cells. I don't think that reshaping
methods work well when you need to extract info at the same time:
Sample data:
rows = 3
df = pd.DataFrame(data={'bus_uid': list(repeat('biomass: DEB31', rows)), 'type': list(repeat('biomass', 3)), 'id': ['id1', 'id2', 'id3'], 'datetime': list(repeat(pd.DatetimeIndex(start=datetime(2016,1,1), periods=3, freq='D'), rows)), 'values': list(repeat([1,2,3], rows))})
bus_uid datetime id \
0 biomass: DEB31 DatetimeIndex(['2016-01-01', '2016-01-02', '20... id1
1 biomass: DEB31 DatetimeIndex(['2016-01-01', '2016-01-02', '20... id2
2 biomass: DEB31 DatetimeIndex(['2016-01-01', '2016-01-02', '20... id3
type values
0 biomass [1, 2, 3]
1 biomass [1, 2, 3]
2 biomass [1, 2, 3]
Build new DataFrame
as you iterate through the DataFrame
rows
:
new_df = pd.DataFrame()
for index, cols in df.iterrows():
extract_df = pd.DataFrame.from_dict({'datetime': cols.ix['datetime'], 'values': cols.ix['values']})
extract_df = pd.concat([extract_df, cols.drop(['datetime', 'values']).to_frame().T], axis=1).fillna(method='ffill').fillna(method='bfill')
new_df = pd.concat([new_df, extract_df], ignore_index=True)
to get:
datetime values bus_uid id type
0 2016-01-01 1 biomass: DEB31 id1 biomass
1 2016-01-02 2 biomass: DEB31 id1 biomass
2 2016-01-03 3 biomass: DEB31 id1 biomass
3 2016-01-01 1 biomass: DEB31 id2 biomass
4 2016-01-02 2 biomass: DEB31 id2 biomass
5 2016-01-03 3 biomass: DEB31 id2 biomass
6 2016-01-01 1 biomass: DEB31 id3 biomass
7 2016-01-02 2 biomass: DEB31 id3 biomass
8 2016-01-03 3 biomass: DEB31 id3 biomass
Upvotes: 2
Reputation: 862511
You can extract from columns values
and datetime
new Series
and then merge them with original dataframe df
by concat
:
s1 = df['values'].apply(pd.Series, 1).stack()
s1.index = s1.index.droplevel(-1) # to line up with df's index
s1.name = 'values' # needs a name to join
s2 = df['datetime'].apply(pd.Series, 1).stack()
s2.index = s2.index.droplevel(-1) # to line up with df's index
s2.name = 'datetime' # needs a name to join
#remove duplicity columns
df = df.drop( ['values', 'datetime'], axis=1)
#concat all together
df= pd.concat([df,s1,s2], axis=1).reset_index(drop=True)
print df
bus_uid bus_type type obj_uid values \
0 0 biomass: DEB31 biomass output Simple_139804698384200 1.0
1 0 biomass: DEB31 biomass output Simple_139804698384200 2.0
2 0 biomass: DEB31 biomass output Simple_139804698384200 3.0
3 0 biomass: DEB31 biomass other duals 4.0
4 0 biomass: DEB31 biomass other duals 5.0
5 0 biomass: DEB31 biomass other duals 6.0
6 0 biomass: DEB31 biomass other excess 7.0
7 0 biomass: DEB31 biomass other excess 8.0
8 0 biomass: DEB31 biomass other excess 9.0
datetime
0 2015-01-01 00:00:00
1 2015-01-01 01:00:00
2 2015-01-01 02:00:00
3 2015-01-01 00:00:00
4 2015-01-01 01:00:00
5 2015-01-01 02:00:00
6 2015-01-01 00:00:00
7 2015-01-01 01:00:00
8 2015-01-01 02:00:00
Upvotes: 2