BritJam
BritJam

Reputation: 29

Understanding dataframe.shape df.shape

I am taking the udacity data analysis course and i am having trouble understanding an answer.

Have been asked to "create color array for red dataframe".

the answer is

color_red = np.repeat('red', red_df.shape[0])

I understand in np.repeat the first parameter is the input array "red", The second parameter is repeats for each element, red_df.shape[0].

if np.repeat(3, 4) the return would be array([3, 3, 3, 3]).

Anybody able to set me in the right thought direction?

Upvotes: 0

Views: 4952

Answers (2)

islam ibrahim
islam ibrahim

Reputation: 1

This part (red_df.shape[0]) just to return an integer with the total number of rows in the red_df to create the new add column 'Color' with the same number of raws of its related red_df so, when we append it later with the white_df, it doesn't shift down the other white_df and creatw empty rows on the other columns.

You can simply delete this section and write it like this:

color_red = np.repeat('red', red_df.shape[0])
color_red = np.repeat('red', 1599)

Full program will be

import pandas as pd
import numpy as np

df_red = pd.read_csv('winequality-red.csv',sep=';')

df_white = pd.read_csv('winequality-white.csv',sep=';')

print(df_red.info())

print(df_red.shape[0])

# shape[0} refer to the number of columns which is 1599 shape[1] refer to the number of rows which is 12

# create color array for red dataframe
color_red = np.repeat('red', 1599)

# create color array for white dataframe
color_white = np.repeat('white', df_white.shape[0])


df_red['color'] = color_red

df_white['color'] = color_white

#combine data frame into one data frame called wine_df

wine_df = df_red.append(df_white)

print(wine_df.head())

wine_df.to_csv('winequality_edited.csv', index=False)

Upvotes: 0

hpaulj
hpaulj

Reputation: 231530

Get into an interactive Python session with numpy and pandas, and experiment

Make a dataframe:

In [394]: df=pd.DataFrame(np.eye(3))                                            
In [395]: df                                                                    
Out[395]: 
     0    1    2
0  1.0  0.0  0.0
1  0.0  1.0  0.0
2  0.0  0.0  1.0

Check its shape. That's a tuple (basic Python object):

In [396]: df.shape                                                              
Out[396]: (3, 3)
In [397]: df.shape[0]     # first element of the tuple                                                          
Out[397]: 3

Repeat with the shape parameter is just like using the number 3:

In [398]: np.repeat('red', df.shape[0])                                         
Out[398]: array(['red', 'red', 'red'], dtype='<U3')

Pandas and numpy are running in Python. So the regular evaluation order of Python applies.

Upvotes: 3

Related Questions