Rashid Abramov
Rashid Abramov

Reputation: 45

How can I count values for each date in a Dataframe based conditionally on the value of a column?

I have a dataframe with xenophobic and non-xenophobic tweets. For each day, I want to count the number of tweets that have a sentiment of 1.

This is the Dataframes df_unevaluated

  sentiment id                  date                  text
0   0   9.820000e+17    2018-04-05 11:43:31+00:00   but if she had stated another fact like that I may have thought...
1   0   1.170000e+18    2019-09-03 22:53:30+00:00   the worst thing that dude has done this week is ramble about the...
2   0   1.140000e+18    2019-06-28 17:43:07+00:00   i think immigrants of all walks of life should be allowed into...
3   0   2.810000e+17    2012-12-18 00:43:57+00:00   why is america not treating the immigrants like normal people...
4   1   8.310000e+17    2017-02-14 01:42:26+00:00   who the hell wants to live in canada anyhow the people there...
...

This is what I've tried:

      # Put all tweets with sentiment = 1 into a Dataframes

for i in range(len(df_unevaluated)):
    if df_unevaluated['sentiment'][i] == 1:
        df_xenophobic = df_xenophobic.append(df_unevaluated.iloc[[i]])

      # Store a copy of df_xenophobic in df_counts
df_counts = df_xenophobic

      # Change df_counts to get counts for each date
df_counts = (pd.to_datetime(df_counts['date'])
       .dt.floor('d')
       .value_counts()
       .rename_axis('date')
       .reset_index(name='count'))

      # Sort data and drop index column
df_counts = df_counts.sort_values('date')
df_counts = df_counts.reset_index(drop=True)

      # Look at data
df_counts.head()

This was the output:

           date                count
0   2012-03-14 00:00:00+00:00   1
1   2012-03-19 00:00:00+00:00   1
2   2012-04-07 00:00:00+00:00   1
3   2012-04-10 00:00:00+00:00   1
4   2012-04-19 00:00:00+00:00   1
...

This is what I expected:

              date              count
0   2012-03-14 00:00:00+00:00   1
1   2012-03-15 00:00:00+00:00   0
2   2012-03-16 00:00:00+00:00   0
3   2012-03-17 00:00:00+00:00   0
4   2012-03-18 00:00:00+00:00   0
5   2012-03-19 00:00:00+00:00   1
6   2012-03-20 00:00:00+00:00   0
7   2012-03-21 00:00:00+00:00   0
...

These are some links I've read through: Python & Pandas - Group by day and count for each day

Using value_counts in pandas with conditions

https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.dt.floor.html

To be more clear, the each date has the format YYYY-MM-DD HH:MM:SS+00:00

As seen in my attempt, I try to round the dates column to its day. My goal is to count the number of times sentiment = 1 for that day.

Upvotes: 0

Views: 215

Answers (1)

Sadman Sakib
Sadman Sakib

Reputation: 595

If I understood your question correctly, then it should be as simple as follows:

import pandas as pd

# Data Load
df = pd.DataFrame(data={'Date': ['2022-11-28 11:43:31+00:00', '2022-11-28 22:53:30+00:00', '2022-11-29 17:43:07+00:00', '2022-12-01 01:42:26+00:00', '2022-12-01 02:40:26+00:00'],
                      'Sentiment': [ 0, 1, 0, 1, 1]})

df['Date'] = pd.to_datetime(df['Date']).dt.date

df_counts = df.groupby(by=['Date']).sum().reset_index()

The df_counts data frame should give output like this:

enter image description here

Upvotes: 1

Related Questions