NNsr
NNsr

Reputation: 1063

Read Excel with varying number of rows per column to dataframe without NAN

I have an excel file that looks like:

sample input

I would like to read this into pandas dataframe and avoid getting NAN values. The reason I am trying to avoid getting NAN is that later on, I want to create a simple contingency table similar to this

sample output

and NAN values create issues in this contingency table. Is there any elegant way of doing this without reading columns separately, using value_counts(), and concatenating the series, as below?

df_1=pd.read_excel('Book1.xlsx', usecols='A')
df_2=pd.read_excel('Book2.xlsx', usecols='B')
value_c_1 = df_1.value_counts()
value_c_2 = df_2.value_counts()
pd.concat([value_c_1, value_c_2], axis=1)

There must be an elegant way of doing this; my apologies if the answer is so obvious; I searched for it before posting this question.

Upvotes: 1

Views: 209

Answers (1)

AfterFray
AfterFray

Reputation: 1851

If your excel is like following dataframe :

df = pd.DataFrame({'A' : ['x','y','x','x', '','','',''],
              'B' : ['y', 'x','x','x','x','y','y','x',]})

You can try this :

df.apply(pd.value_counts).loc[['x','y']]

Upvotes: 1

Related Questions