Reputation: 159
I have yield data for some villages, For sample data, please see attachment.enter image description here
Village Yield(in Kg) Date
Village1 0.22 01/06/18
Village1 0.23 02/06/18
Village1 0.55 01/06/18
Village1 0.2 02/06/18
Village2 0.88 31/05/18
Village2 0.89 30/05/18
Village2 0.63 30/05/18
Village2 0.55 30/05/18
Now, I want to showing that , villages corresponding yield data with experiment date.so, village1 has 4 yield values.
Please see attachment.enter image description here
Village Yield-1 Yield-2 Yield-3
Village1 0.22 01/06/18 0.23 02/06/18 0.55 01/06/18
Any help it will helpful. Thanks
Upvotes: 3
Views: 124
Reputation: 71600
Try using groupby
then get the values of it then convert group
to a dictionary then create a data frame out of the dictionary then transpose it then create a new column for the average using mean
:
import pandas as pd
df = pd.DataFrame({'Village': ['Village1', 'Village1',
'Village1', 'Village1', 'Village2',
'Village2', 'Village2', 'Village2'],
'Yield (in kg)': [0.22,0.23,0.55,0.2, 0.88, 0.89, 0.63, 0.55]})
group = df.groupby('Village')['Yield (in kg)'].apply(lambda x: x.values)
df = pd.DataFrame(group.to_dict()).T
df.columns = df.columns.astype(str)
df['Average'] = df.mean(axis=1)
print(df)
Output:
0 1 2 3 Average
Village1 0.22 0.23 0.55 0.20 0.3000
Village2 0.88 0.89 0.63 0.55 0.7375
to rename the columns:
df.columns = ['Yield (in kg)-'+i for i in df.columns if i != 'Average']
Output:
Yield (in kg)-0 Yield (in kg)-1 Yield (in kg)-2 Yield (in kg)-3 /
Village1 0.22 0.23 0.55 0.20
Village2 0.88 0.89 0.63 0.55
Average
Village1 0.3000
Village2 0.7375
Upvotes: 1
Reputation: 11192
try this,
df.groupby(['Village']).apply(lambda x:pd.Series(zip(x['Yield(in Kg)'],x['Date']))).reset_index()
Village 0 1 2 \
0 Village1 (0.22, 01/06/18) (0.23, 02/06/18) (0.55, 01/06/18)
1 Village2 (0.88, 31/05/18) (0.89, 30/05/18) (0.63, 30/05/18)
3
0 (0.2, 02/06/18)
1 (0.55, 30/05/18)
to rename the columns:
col1=df.filter(regex='\d+').columns.values
col2=['Yield - '+str(col+1) for col in col1]
df.rename(columns= dict(zip(col1,col2)),inplace=True)
Village Yield - 1 Yield - 2 Yield - 3 \
0 Village1 (0.22, 01/06/18) (0.23, 02/06/18) (0.55, 01/06/18)
1 Village2 (0.88, 31/05/18) (0.89, 30/05/18) (0.63, 30/05/18)
Yield - 4
0 (0.2, 02/06/18)
1 (0.55, 30/05/18)
Upvotes: 0
Reputation: 863166
Use:
df = (df.set_index(['Village', df.groupby('Village').cumcount().add(1)])
.unstack()
.sort_index(axis=1, level=1))
df.columns = ['{}-{}'.format(x, y) for x, y in df.columns]
df = df.reset_index()
print (df)
Village Yield(in Kg)-1 Date-1 Yield(in Kg)-2 Date-2 \
0 Village1 0.22 01/06/18 0.23 02/06/18
1 Village2 0.88 31/05/18 0.89 30/05/18
Yield(in Kg)-3 Date-3 Yield(in Kg)-4 Date-4
0 0.55 01/06/18 0.20 02/06/18
1 0.63 30/05/18 0.55 30/05/18
Explanation:
set_index
with groups created by GroupBy.cumcount
and unstack
for reshapeMultiIndex
by sort_index
Multiindex
in columns with list comprehensionreset_index
Upvotes: 1