Reputation: 159

Using python pandas how to do some following analysis

I have yield data for some villages, For sample data, please see attachment.enter image description here

Village Yield(in Kg)    Date
Village1    0.22    01/06/18
Village1    0.23    02/06/18
Village1    0.55    01/06/18
Village1    0.2     02/06/18
Village2    0.88    31/05/18
Village2    0.89    30/05/18
Village2    0.63    30/05/18
Village2    0.55    30/05/18

Now, I want to showing that , villages corresponding yield data with experiment date.so, village1 has 4 yield values.

Please see attachment.enter image description here

Village     Yield-1                Yield-2              Yield-3 
Village1    0.22    01/06/18        0.23    02/06/18    0.55    01/06/18

Any help it will helpful. Thanks

Upvotes: 3

Answers (3)

U13-Forward

Reputation: 71600

Try using groupby then get the values of it then convert group to a dictionary then create a data frame out of the dictionary then transpose it then create a new column for the average using mean:

import pandas as pd
df = pd.DataFrame({'Village': ['Village1', 'Village1',
                               'Village1', 'Village1', 'Village2',
                               'Village2', 'Village2', 'Village2'],
                   'Yield (in kg)': [0.22,0.23,0.55,0.2, 0.88, 0.89, 0.63, 0.55]})
group = df.groupby('Village')['Yield (in kg)'].apply(lambda x: x.values)
df = pd.DataFrame(group.to_dict()).T
df.columns = df.columns.astype(str)
df['Average'] = df.mean(axis=1)
print(df)

Output:

          0     1     2     3      Average
Village1  0.22  0.23  0.55  0.20   0.3000
Village2  0.88  0.89  0.63  0.55   0.7375

to rename the columns:

df.columns = ['Yield (in kg)-'+i for i in df.columns if i != 'Average']

Output:

          Yield (in kg)-0  Yield (in kg)-1  Yield (in kg)-2  Yield (in kg)-3   /
Village1   0.22             0.23             0.55             0.20   
Village2   0.88             0.89             0.63             0.55   

                         Average  
Village1                 0.3000  
Village2                 0.7375

Upvotes: 1

Mohamed Thasin ah

Reputation: 11192

try this,

df.groupby(['Village']).apply(lambda x:pd.Series(zip(x['Yield(in Kg)'],x['Date']))).reset_index()


    Village                 0                 1                 2  \
0  Village1  (0.22, 01/06/18)  (0.23, 02/06/18)  (0.55, 01/06/18)   
1  Village2  (0.88, 31/05/18)  (0.89, 30/05/18)  (0.63, 30/05/18)   

                  3  
0   (0.2, 02/06/18)  
1  (0.55, 30/05/18)

to rename the columns:

col1=df.filter(regex='\d+').columns.values
col2=['Yield - '+str(col+1) for col in col1]
df.rename(columns= dict(zip(col1,col2)),inplace=True)

    Village         Yield - 1         Yield - 2         Yield - 3  \
0  Village1  (0.22, 01/06/18)  (0.23, 02/06/18)  (0.55, 01/06/18)   
1  Village2  (0.88, 31/05/18)  (0.89, 30/05/18)  (0.63, 30/05/18)   

          Yield - 4  
0   (0.2, 02/06/18)  
1  (0.55, 30/05/18)

Upvotes: 0

jezrael

Reputation: 863166

Use:

df = (df.set_index(['Village', df.groupby('Village').cumcount().add(1)])
        .unstack()
        .sort_index(axis=1, level=1))

df.columns = ['{}-{}'.format(x, y) for x, y in df.columns]
df = df.reset_index()
print (df)
    Village  Yield(in Kg)-1    Date-1  Yield(in Kg)-2    Date-2  \
0  Village1            0.22  01/06/18            0.23  02/06/18   
1  Village2            0.88  31/05/18            0.89  30/05/18   

   Yield(in Kg)-3    Date-3  Yield(in Kg)-4    Date-4  
0            0.55  01/06/18            0.20  02/06/18  
1            0.63  30/05/18            0.55  30/05/18

Explanation:

First set_index with groups created by GroupBy.cumcount and unstack for reshape
Sort second level of MultiIndex by sort_index
Flatten Multiindex in columns with list comprehension
Create column from index by reset_index

Upvotes: 1

Using python pandas how to do some following analysis

Answers (3)

Related Questions