coral
coral

Reputation: 56

Finding frequency of items in cell of column pandas

I have DataFrame with almost 500 rows and 3 columns.

One of the columns has a string of dates and each cell has a unique date, but some cell have a common date and some cells are seem empty.

I'm trying to find the frequency of each day in a cell

df|Number_of_dates     | Date   
--|--------------------|---------------------
0 | 0.0                |  []
1 | 3.0                |  ['2006-01-01' '2006-03-22' '2019-07-29']
2 | 8.0                |  ['2006-01-01' '2006-04-13' '2006-07-18' '2006-...
3 | 1.0                |  ['2006-07-18']
4 | 1.0                |  ['2019-07-29']
5 | 0.0                |  []
6 | 397.0              |  ['2019-01-02' '2019-01-03' '2019-01-04' '2019-...

Result:

df_1 |Date         | Frequency 
-----|------------ |---------------------
0    | 2006-01-01  |2
1    | 2006-03-22  |1
2    | 2006-04-13  |1
3    | 2006-07-18  |2
4    | 2019-07-29  |3

It would be very helpful if you could provide some guidance.

Thanks in advance

additional information: I noticed that each cell has a string value instead of a list

Sample DataFrame

d = {"Date":[ "['2005-02-02' '2005-05-04' '2005-08-03' '2005-11-02' '2006-02-01' '2006-05-03']",
             "['2006-01-31' '2006-02-01' '2006-03-16'\n '2006-06-13']",
             "['2005-10-12' '2005-10-13' '2005-10-14'\n '2005-10-17']",
             "[]",
             "['2005-07-25' '2005-07-26' '2005-07-27'\n '2005-07-28' '2005-07-29' '2005-08-01' '2005-08-02' '2005-08-03'\n '2005-08-04' '2005-08-05']",
             "['2005-03-15' '2005-03-16' '2005-03-17'\n '2005-03-18' '2005-03-21' '2005-03-22' '2005-03-23' '2005-03-24' \n'2005-03-28' '2005-03-29' '2005-03-30' '2005-03-31' '2005-04-01'\n '2005-04-04']",
             "['2005-03-16' '2005-03-17' '2005-07-27'\n '2006-06-13']",
             "['2005-02-02' '2005-05-04' '2005-03-16' '2005-03-17']",
             "[]"
            ]
    }

df = pd.DataFrame(d)

Upvotes: 0

Views: 80

Answers (1)

jezrael
jezrael

Reputation: 862481

Use DataFrame.explode with GroupBy.size:

#create list from sample data
df['Date'] = df['Date'].str.strip('[]').str.split()
df_1 = df.explode('Date').groupby('Date').size().reset_index(name='Frequency')

print (df_1.head(10))
            Date  Frequency
0   '2005-02-02'          2
1   '2005-03-15'          1
2   '2005-03-16'          3
3   '2005-03-17'          3
4   '2005-03-18'          1
5   '2005-03-21'          1
6   '2005-03-22'          1
7   '2005-03-23'          1
8   '2005-03-24'          1
9   '2005-03-28'          1

Upvotes: 2

Related Questions