Marc
Marc

Reputation: 25

How to get length of array for each row in pandas (python)?

I've an json array with data like

[{
    'Id': 2837
    'Date': datetime.date(2020, 01, 01),
    'Caption': 'asdf'
}, {
    'Id': 2838
    'Date': datetime.date(2020, 01, 01),
    'Caption': 'fasd'
},{
    'Id': 2839
    'Date': datetime.date(2020, 01, 04),
    'Caption': 'sdfa'
}]

and a dataframe with a row for each date

  DATE       LEN
0 2020-01-01 NaN
1 2020-01-02 NaN
2 2020-01-03 NaN
3 2020-01-04 NaN
4 2020-01-05 NaN
5 2020-01-06 NaN

Now I would like to count the elements in the json array and write them into the dataframe. I tried

df['LEN'] = len([x for x in jsonArray if x['Date'] == df['DATE']])

but it's not working. I get an error ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

What am I doing wrong?

Upvotes: 1

Views: 3218

Answers (3)

null
null

Reputation: 2137

You can do it easily with join and groupby

Code

df_json = pd.DataFrame(jsonArray)
df_json['Date'] = df_json['Date'].astype(str)


df = df.set_index('Date').join(df_json.set_index('Date'), how='left', on='Date')

grp_df = df.reset_index().groupby('Date', as_index=False)['Id'].count()
grp_df.rename(columns={'Id': 'Len'}, inplace=True)

Output

         Date  Len
0  2020-01-01    2
1  2020-01-02    0
2  2020-01-03    0
3  2020-01-04    1
4  2020-01-05    0
5  2020-01-06    0

Upvotes: 0

RichieV
RichieV

Reputation: 5183

You can parse jsonArray into a dataframe and use groupby to count the rows for each date.

Sample data

import datetime 

jsonArray = ([
    {'Id': 2837, 'Date': datetime.date(2020, 1, 1), 'Caption': 'asdf'},
    {'Id': 2838, 'Date': datetime.date(2020, 1, 1), 'Caption': 'fasd'},
    {'Id': 2839, 'Date': datetime.date(2020, 1, 4), 'Caption': 'sdfa'},
])
df_dates = pd.DataFrame({
    'DATE': pd.date_range('2020-01-01', '2020-01-06'),
    'LEN': float('nan')
})

The code

# parse jsonArray to dataframe
df = pd.DataFrame(jsonArray)

# count rows for each Date
df = df.groupby('Date').size().rename('LEN')

# set DATE to index so `df.update` can align
df_dates.set_index('DATE', inplace=True)
df_dates.update(df)

Output

            LEN
DATE
2020-01-01  2.0
2020-01-02  NaN
2020-01-03  NaN
2020-01-04  1.0
2020-01-05  NaN
2020-01-06  NaN

Upvotes: 1

zabop
zabop

Reputation: 7852

# necessary imports
import datetime
import pandas as pd

Lets say you have this list:

ourlist = \
[{
    'Id': 2837,
    'Date': datetime.date(2020, 1, 1),
    'Caption': 'asdf'
},{
    'Id': 2838,
    'Date': datetime.date(2020, 1, 1),
    'Caption': 'fasd'
},{
    'Id': 2839,
    'Date': datetime.date(2020, 1, 4),
    'Caption': 'sdfa'
}]

and a dataframe, for example:

df = pd.DataFrame([datetime.datetime(2020,1,1),
                   datetime.datetime(2020,1,2),
                   datetime.datetime(2020,1,3),
                   datetime.datetime(2020,1,4)],
                  columns=['DATE'])

Then you can do:

df['LEN'] = df['DATE'].apply(lambda datetimeobject: sum(datetimeobject == dates for dates in [eachdict['Date'] for eachdict in ourlist]))

or, more concisely, using How can I count the occurrences of a list item?:

df['LEN'] = df['DATE'].apply(lambda datetimeobject: [eachdict['Date'] for eachdict in ourlist].count(datetimeobject))

(In both cases, I am using a list comprehension to get the Date entries from ourlist, a list of dictionaries.)

Both of them will result in a df:

    DATE        LEN
0   2020-01-01  2
1   2020-01-02  0
2   2020-01-03  0
3   2020-01-04  1

Upvotes: 0

Related Questions