Reputation: 29
I am taking the udacity data analysis course and i am having trouble understanding an answer.
Have been asked to "create color array for red dataframe".
the answer is
color_red = np.repeat('red', red_df.shape[0])
I understand in np.repeat
the first parameter is the input array "red", The second parameter is repeats for each element, red_df.shape[0]
.
if np.repeat(3, 4)
the return would be array([3, 3, 3, 3])
.
Anybody able to set me in the right thought direction?
Upvotes: 0
Views: 4952
Reputation: 1
This part (red_df.shape[0])
just to return an integer with the total number of rows in the red_df
to create the new add column 'Color' with the same number of raws of its related red_df
so, when we append it later with the white_df
, it doesn't shift down the other white_df
and creatw empty rows on the other columns.
You can simply delete this section and write it like this:
color_red = np.repeat('red', red_df.shape[0])
color_red = np.repeat('red', 1599)
Full program will be
import pandas as pd
import numpy as np
df_red = pd.read_csv('winequality-red.csv',sep=';')
df_white = pd.read_csv('winequality-white.csv',sep=';')
print(df_red.info())
print(df_red.shape[0])
# shape[0} refer to the number of columns which is 1599 shape[1] refer to the number of rows which is 12
# create color array for red dataframe
color_red = np.repeat('red', 1599)
# create color array for white dataframe
color_white = np.repeat('white', df_white.shape[0])
df_red['color'] = color_red
df_white['color'] = color_white
#combine data frame into one data frame called wine_df
wine_df = df_red.append(df_white)
print(wine_df.head())
wine_df.to_csv('winequality_edited.csv', index=False)
Upvotes: 0
Reputation: 231530
Get into an interactive Python session with numpy
and pandas
, and experiment
Make a dataframe:
In [394]: df=pd.DataFrame(np.eye(3))
In [395]: df
Out[395]:
0 1 2
0 1.0 0.0 0.0
1 0.0 1.0 0.0
2 0.0 0.0 1.0
Check its shape
. That's a tuple
(basic Python object):
In [396]: df.shape
Out[396]: (3, 3)
In [397]: df.shape[0] # first element of the tuple
Out[397]: 3
Repeat with the shape parameter is just like using the number 3:
In [398]: np.repeat('red', df.shape[0])
Out[398]: array(['red', 'red', 'red'], dtype='<U3')
Pandas and numpy are running in Python. So the regular evaluation order of Python applies.
Upvotes: 3