Reputation: 1
the data frame looks like this. I have tried with pivot, stack, unstack. Is there any method to achieve the output
key attribute text_value numeric_value date_value
0 1 order NaN NaN 10/02/19
1 1 size NaN 43.0 NaN
2 1 weight NaN 22.0 NaN
3 1 price NaN 33.0 NaN
4 1 segment product NaN NaN
5 2 order NaN NaN 11/02/19
6 2 size NaN 34.0 NaN
7 2 weight NaN 32.0 NaN
8 2 price NaN 89.0 NaN
9 2 segment customer NaN NaN
I need the following output
key order size weight price segment
1 10/2/2019 43.0 22.0 33.0 product
2 11/2/2019 34.0 32.0 89.0 customer
Thanks in advance
Upvotes: 0
Views: 38
Reputation: 1
This is the solution, I figured out
attr_dict = {'order':'date_value', 'size':'numeric_value', 'weight':'numeric_value', 'price':'numeric_value', 'segment':'text_value'}
output_table = pd.DataFrame()
for attr in attr_dict.keys():
temp = input_table[input_table['attribute'] == attr][['key', attr_dict[attr]]]
temp.rename(columns={attr_dict[attr]:attr}, inplace=True)
output_table[attr] = list(temp.values[:,1])
output_table
Upvotes: 0
Reputation: 862511
I believe you dont want to change dtypes
in output data, so possible solution is processing each column separately by DataFrame.dropna
and DataFrame.pivot
and then join together by concat
:
df['date_value'] = pd.to_datetime(df['date_value'])
df1 = df.dropna(subset=['text_value']).pivot('key','attribute','text_value')
df2 = df.dropna(subset=['numeric_value']).pivot('key','attribute','numeric_value')
df3 = df.dropna(subset=['date_value']).pivot('key','attribute','date_value')
df = pd.concat([df1, df2, df3], axis=1).reindex(df['attribute'].unique(), axis=1)
print (df)
attribute order size weight price segment
key
1 2019-10-02 43.0 22.0 33.0 product
2 2019-11-02 34.0 32.0 89.0 customer
print (df.dtypes)
order datetime64[ns]
size float64
weight float64
price float64
segment object
dtype: object
Old answer - all values are casted to strings:
df['date_value'] = pd.to_datetime(df['date_value'])
df['text_value'] = df['text_value'].fillna(df['numeric_value']).fillna(df['date_value'])
df = df.pivot('key','attribute','text_value')
print (df)
attribute order price segment size weight
key
1 1569974400000000000 33 product 43 22
2 1572652800000000000 89 customer 34 32
print (df.dtypes)
order object
price object
segment object
size object
weight object
dtype: object
Upvotes: 1