Mohd Abdul Azeem
Mohd Abdul Azeem

Reputation: 152

how to create columns based on same date

I have the dataset having columns....

                                created_at        date      time  timezone  \
0  2021-06-03 09:01:59 India Standard Time  2021-06-03  09:01:59       530
1  2021-06-03 09:01:41 India Standard Time  2021-06-03  09:01:41       530
2  2021-06-03 07:32:58 India Standard Time  2021-06-03  07:32:58       530
3  2021-06-03 07:31:55 India Standard Time  2021-06-03  07:31:55       530
4  2021-06-03 06:00:52 India Standard Time  2021-06-03  06:00:52       530

                                             tweet 
0  "Advertisers offering #cryptocurrency exchange...
1  Beijing to Disperse $6 Million in Digital Yuan...
2  “Combating ransomware is a priority for the ad...
3  Guggenheim has registered a fund with the SEC ...
4  The most recent 2009 spend moved last year, an...

The complete link to the dataset is here. I want to create columns based on dates. Like for the date 2021-06-03, each headline should have one columns and so on for other date. For reference to how I want the columns can be found here in Combined_news.csv file. The below code is my approach.

from collections import defaultdict

d = defaultdict(list)

i = 1
j = 1
while i < btc_news.shape[0]:
    if btc_news.loc[i, 'date'] == btc_news.loc[i-1, 'date']:
        temp = 'headline' + str(j)
        d[temp].append(btc_news.loc[i-1, 'tweet'])
        j += 1
        i += 1
        continue
    else:
        temp = 'headline' + str(j)
        d[temp].append(btc_news.loc[i-1, 'tweet'])
        d['date'].append(btc_news.loc[i-1, 'date'])
    j = 1
    i += 1

I want the output in the way which I mentioned in Combined_News.csv. I want all the headlines for that particular date in different columns along with the date column. I hope you understand my question.

Upvotes: 0

Views: 119

Answers (1)

not_speshal
not_speshal

Reputation: 23146

Might not be the most efficient solution, but this works.

First, you groupby the date and concatenate all the tweets for one date:

df2 = df.groupby("date").apply(lambda x: x["tweet"].to_list())

Next, you split the list into individual columns:

output = pd.DataFrame(df2.values.tolist()).add_prefix("top_").set_index(df2.index)

The output format is:

>>> output.head()
                                                        top_0  ... top_35
date                                                           ...       
2015-07-12  Bitcoin the Next Logical Step in the Rise of U...  ...   None
2015-07-13  BitGive Foundation Announces New Initiatives a...  ...   None
2015-07-14  Keynote 2015: Harnessing the Distributed Ledge...  ...   None
2015-07-15  Patrick Byrne Says Wil                        ...  ...   None
2015-07-16  2015 Q1 Bitcoin Investment Trumps 2014 Numbers...  ...   None

Upvotes: 1

Related Questions