What is the quickest way to add rows to a pandas dataframe built from a list?

Question

I am trying to create a dataframe of twitter data. Using the twitter API, I have a list of twitter objects as a list (tweets) and want to populate a dataframe with various info from those twitter objects and using some other functions on the text. The current method I have uses list comprehension for each column, iterating through all tweets each time.

df = pd.DataFrame(data=[tweet.all_text for tweet in tweets], columns=["tweets"])

df.loc[:, 'id'] = np.array([tweet.id for tweet in tweets])
df.loc[:, 'len_tweet'] = np.array([len(tweet.all_text) for tweet in tweets])
df.loc[:, 'date_created'] = np.array([tweet.created_at_datetime for tweet in tweets])
df.loc[:, 'author'] = np.array([tweet.name for tweet in tweets])
df.loc[:, 'clean_tweet'] = np.array([self.clean_tweet_eng(tweet) for tweet in df.tweets])
df.loc[:, 'clean_stopwords_tweet'] = np.array([self.stopwords_clean(tweet) for tweet in df.tweets])

etc...

As I scale up the number of tweets, this becomes very slow.

I have looked at two other methods: creating the dataframe through iteratively adding elements to a dictionary, and building up the dataframe one row at a time using iterrows to only cycle through the list of tweets once. Both seem to be slower.

What is the fastest way to do achieve this?

What is the quickest way to add rows to a pandas dataframe built from a list?

Answers (1)

Related Questions