Python Pandas: Series and getting value from data frame counts null entries?

Question

I have a csv file with 7000 rows and 5 cols.

I have an array of 5000 words, that I want to add to the same CSV file in a new column. I added a column 'originalWord ', and used the pd.Series function which added the 5000 words in a single column as I want.

allWords=['x' * 5000]
df['originalWord']=pd.Series(allWords)

My problem now is I want to get the data in the column 'originalWord' - whether by putting them in an array or accessing the column directly - even though it's 5000 rows only and the file has 7000 rows (with the last 2000 being null values)

print(len(df['originalWord']))

7000

Any idea how to make it reflect the original length 5000 ? Thank you.

RDoc · Accepted Answer

If I understand you correctly, what you're asking for isn't possible. From what I can gather, you have a DataFrame that has 7000 rows and 5 columns, meaning that the index is of size 7000. To this DataFrame, you would like to add a column that has 5000 rows. Since there are in total 7000 rows in the DataFrame, the appended column will have 2000 missing values that would thus be assigned NaN. That's why you see the length as 7000.

In short, there is no way of accessing df['originalWord'] and automatically exclude all missing values as even that Series has an index of size 7000. The closest you could get to is to write a function that would include dropna() if the issue is that you find it bothersome to repeatedly call it.

Python Pandas: Series and getting value from data frame counts null entries?

Answers (1)

Related Questions