WhoIsKi
WhoIsKi

Reputation: 117

how to count number of values in 1d array in one of column of Pandas dataframe

I have a data frame and in some of the rows, for one of the columns, I have a 1D array. for example: fig1

how I can count the number of values in the arrays in column data (separated by comma) for each row and show the number of them in a new column of new data frame same as fig 2:

fig2

Upvotes: 1

Views: 1752

Answers (1)

SeaBean
SeaBean

Reputation: 23217

You can use .str.len() to get the item count in lists in column data and then use .groupby() to aggregate the count of same name using .sum(), as follows:

df_out = (df['data'].str.len()
                    .groupby(df['name'], sort=False).sum()
         ).to_frame(name='data_count').reset_index()

Result:

print(df_out)


     name  data_count
0    john           6
1  amanda           0
2    sara           5

Edit

If the column data consists of strings looking like arrays/lists, instead of the 1D array as mentioned in the question, you can run the following code to convert the column into real arrays/lists first:

df['data'] = df['data'].str.strip('[]').str.replace("'", "").str.replace('"', '').replace('', np.nan).str.split(',').fillna({i: [] for i in df.index})

Test Run

Test Data Setup

nan = np.nan
# dict of dataframe dump by df.to_dict() as provided by OP in the comment:
data = {'name': {0: 'john', 1: 'amanda', 2: 'sara', 3: 'john'}, 'data': {0: '[a4G, bweQ, fp_dE4]', 1: nan, 2: '[H2dw45, IfC4, bAf23g, Lkfr54-op, a3dLa]', 3: '[Tr45b, kM30, riU91]'}}
df = pd.DataFrame(data)

df['data'] = df['data'].str.strip('[]').str.replace("'", "").str.replace('"', '').replace('', np.nan).str.split(',').fillna({i: [] for i in df.index})

Run solution codes

df_out = (df['data'].str.len()
                    .groupby(df['name'], sort=False).sum()
         ).to_frame(name='data_count').reset_index()

Result:

print(df_out)


     name  data_count
0    john           6
1  amanda           0
2    sara           5

Upvotes: 1

Related Questions