Reputation: 37
What parameter should we use to include the URL/link of tweets? I have here the date, username, and content. Another question also is how can we transform the date in the dataframe into GMT+8? The timezone is in UTC. Please see code below for reference:
import snscrape.modules.twitter as sntwitter
import pandas as pd
query = "(from:elonmusk) until:2023-01-28 since:2023-01-27"
tweets = []
limit = 100000
for tweet in sntwitter.TwitterSearchScraper(query).get_items():
if len(tweets) == limit:
break
else:
tweets.append([tweet.date, tweet.username, tweet.content])
df = pd.DataFrame(tweets, columns=['Date', 'Username', 'Tweet'])
#Save to csv
df.to_csv('tweets.csv')
df
Upvotes: 1
Views: 1586
Reputation: 9410
The get_items()
return every single a search result with class type.
So the count of tweets needs to count by for loop.
100K tweets is possible but it take too much time, I reduced 1K tweets.
import snscrape.modules.twitter as sntwitter
import pandas as pd
query = 'from:elonmusk since:2022-08-01 until:2023-01-28'
limit = 1000
tweets = sntwitter.TwitterSearchScraper(query).get_items()
index = 0
df = pd.DataFrame(columns=['Date','URL' ,'Tweet'])
for tweet in tweets:
if index == limit:
break
URL = "https://twitter.com/{0}/status/{1}".format(tweet.user.username,tweet.id)
df2 = {'Date': tweet.date, 'URL': URL, 'Tweet': tweet.rawContent}
df = pd.concat([df, pd.DataFrame.from_records([df2])])
index = index + 1
# # Converting time zone from UTC to GMT+8
df['Date'] = df['Date'].dt.tz_convert('Etc/GMT+8')
print(df)
df.to_csv('tweets.csv')
This single data of get_items() it needs to extract only required key's value
tweet.date -> Date
https://twitter.com/tweet.user.username/status/tweet.id-> URL
tweet.rawContent-> Tweet
{
"_type": "snscrape.modules.twitter.Tweet",
"url": "https://twitter.com/elonmusk/status/1619164489710178307",
"date": "2023-01-28T02:44:31+00:00",
"rawContent": "@tn_daki @ShitpostGate Yup",
"renderedContent": "@tn_daki @ShitpostGate Yup",
"id": 1619164489710178307,
"user": {
"_type": "snscrape.modules.twitter.User",
"username": "elonmusk",
"id": 44196397,
"displayname": "Mr. Tweet",
"rawDescription": "",
"renderedDescription": "",
"descriptionLinks": null,
"verified": true,
"created": "2009-06-02T20:12:29+00:00",
"followersCount": 127536699,
"friendsCount": 176,
"statusesCount": 22411,
"favouritesCount": 17500,
"listedCount": 113687,
"mediaCount": 1367,
"location": "",
"protected": false,
"link": null,
"profileImageUrl": "https://pbs.twimg.com/profile_images/1590968738358079488/IY9Gx6Ok_normal.jpg",
"profileBannerUrl": "https://pbs.twimg.com/profile_banners/44196397/1576183471",
"label": null,
"url": "https://twitter.com/elonmusk"
}
... cut off
>python get-data.py
Date URL Tweet
0 2023-01-27 15:29:36-08:00 https://twitter.com/elonmusk/status/1619115435... @farzyness No way
0 2023-01-27 15:14:05-08:00 https://twitter.com/elonmusk/status/1619111533... @mtaibbi Please correct your bs @PolitiFact &a...
0 2023-01-27 14:52:55-08:00 https://twitter.com/elonmusk/status/1619106207... @WallStreetSilv A quarter of all taxes just to...
0 2023-01-27 13:28:26-08:00 https://twitter.com/elonmusk/status/1619084945... @nudubabba @mikeduncan Yeah, whole thing
0 2023-01-27 13:12:16-08:00 https://twitter.com/elonmusk/status/1619080876... @TaraBull808 That’s way more monkeys than the ...
.. ... ... ...
0 2022-12-14 11:14:53-08:00 https://twitter.com/elonmusk/status/1603106271... @Jason Advertising revenue next year will be l...
0 2022-12-14 04:08:43-08:00 https://twitter.com/elonmusk/status/1602999020... @Balyx_ He would be welcome
0 2022-12-14 03:42:47-08:00 https://twitter.com/elonmusk/status/1602992493... @NorwayMFA @TwitterSupport @jonasgahrstore @AH...
0 2022-12-14 03:35:14-08:00 https://twitter.com/elonmusk/status/1602990594... @AvidHalaby Wow
0 2022-12-14 03:35:03-08:00 https://twitter.com/elonmusk/status/1602990549... @AvidHalaby Live & learn …
[1000 rows x 3 columns]
Converting time zone pandas dataframe
Detain information in here
Example:
URL = "https://twitter.com/elonmusk/status/1619111533216403456"
It saved into csv file.
0,2023-01-27 15:14:05-08:00,https://twitter.com/elonmusk/status/1619111533216403456,@mtaibbi Please correct your bs @PolitiFact & @snopes
It matched the tweet content and pandas Tweet
column.
Also, you can add column, followers Count
, friends Count
, statuses Count
, favourites Count
, listed Count
, media Count
, reply Count
, retweet Count
, like Count
and view Count
too.
Upvotes: 1