Reputation: 135
I have a dataframe called 'dft' of Netflix's TV Shows and movies, with a column, named "listed_in" with entries being a string of all the genres TV shows are classified under. Each row entry has multiple genre classification of different lengths. The genres are written as strings and separated by commas.
A single entry is something like, for example: 'Documentary','International TV Shows','Crime TV Shows'. Another row entry may have different number of genres it classifies under, some of who may be the same as some of the genres of other rows entries.
Now I want to create a list of the unique values in all the rows.
genres = []
for i in range(0,len(dft['listed_in'].str.split(','))):
for j in range(0,len(dft['listed_in'].str.split(',')[i])):
if (dft['listed_in'].str.split(',')[i][j]) not in genres:
genres.append(dft['listed_in'].str.split(',')[i][j])
else:
pass
This keeps the kernel running indefinitely. But the thing is, the list is being created. If I interrupt the kernel after some time, and print the list its there.
Then, I create a dataframe out of this list with the intention of having a column with the count of times each genre appears in the original dataframe.
data = {'Genres':genres,'count':[0 for i in range(0,len(genres))]}
gnr = pd.DataFrame(data = data)
Then to change the count column to each genre's count of occurrence:
for i in range(0,65):
for j in range(0,514):
if gnr.loc[i,'Genres'] in (dft['listed_in'].str.split(',').index[j]):
gnr.loc[i,'count'] = gnr.loc[i,'count'] + dft['listed_in'].str.split(',').value_counts()[j]
else:
pass
Then again this code keeps running indefinitely, but after interrupting it I saw the count for the 1st entry was updated in the gnr dataframe.
I don't know what is happening.
Upvotes: 1
Views: 103
Reputation: 125
Are you sure that the process actually hangs? For loops with pandas is much slower than you would expect especially with the number of iterations you are doing (65*514). If you haven't already id put in a print(i) so you get some insight as to what iteration you're on
Upvotes: 1