Reputation: 245
I have a dataframe df
with a column hashtags
such that:
df['hashtags']
>>>
0 NaN
1 NaN
2 ['COVID19']
3 ['COVID19']
4 ['CoronaVirusUpdates', 'COVID19']
...
132596 ['coronacrise', 'covid19', 'JN', 'NãoÉSóUmNúme...
132597 ['covid19']
132598 ['corona', 'covid19']
132599 NaN
132600 ['covid19']
Name: hashtags, Length: 132601, dtype: object
I want to create a list containing all the lists' elements (except the Nan
) of the column.
I have tried to make a list of lists by:
li = df['hashtags'].tolist()
But it's converting the lists into a string and end up with a list of strings. For example:
li[:5]
>>>
[nan, nan, "['COVID19']", "['COVID19']", "['CoronaVirusUpdates', 'COVID19']"]
My desired output for li[:5]
is like:
['COVID19', 'COVID19', 'CoronaVirusUpdates', 'COVID19', 'coronavirus', 'covid19']
Upvotes: 1
Views: 1055
Reputation: 862406
Idea is first remove missing values by Series.dropna
, then convert list repr by ast.literal_eval
to lists and flatten nested lists in list comprehension:
df = pd.DataFrame({'hashtags':[np.nan, np.nan,
"['COVID19']", "['COVID19']",
"['CoronaVirusUpdates', 'COVID19']"]})
import ast
out = [y for x in df['hashtags'].dropna() for y in ast.literal_eval(x)]
print (out)
['COVID19', 'COVID19', 'CoronaVirusUpdates', 'COVID19']
Upvotes: 2