user2233834
user2233834

Reputation: 373

Pandas Split-Apply-Combine

I have a dataset with userIDs, Tweets and CreatedDates. Each UserID will have multiple tweets created at different dates. I want to find the frequency of tweets and Ive written a small calculation which gives me the number of tweets per hour per userID. I used group by to do this the code as follows :

  twitterDataFrame = twitterDataFrame.set_index(['CreatedAt'])
  tweetsByEachUser = twitterDataFrame.groupby('UserID')
  numberOfHoursBetweenFirstAndLastTweet = (tweetsByEachUser['CreatedAtForCalculations'].first() - tweetsByEachUser['CreatedAtForCalculations'].last()).astype('timedelta64[h]')
  numberOfTweetsByTheUser = tweetsByEachUser.size()
  frequency = numberOfTweetsByTheUser  / numberOfHoursBetweenFirstAndLastTweet

When printing the value of frequency I get :

  UserID
  807095       5.629630
  28785486     2.250000
  134758540    8.333333

Now I need to go back into my big data frame (twitterDataFrame) and add these values alongside the correct UserIDs. How can i possible do that? Id like to say

twitterDataFrame['frequency'] = the frequency corresponding to the UserID. e.g twitterDataFrame['UserID'] and the frequency value we got for that above. 

However I am not sure how i do this. Would anyone know how i can achieve this?

Upvotes: 1

Views: 596

Answers (1)

behzad.nouri
behzad.nouri

Reputation: 77991

You can use join operation on the frequency object you created, or do it in one stage:

get_freq = lambda ts: (ts.last() - ts.first()).astype('timedelta64[h]') / len(ts)
twitterDataFrame['frequency'] = twitterDataFrame.groupby('UserID')['CreatedAtForCalculations'].transform(get_freq)

Upvotes: 2

Related Questions