Reputation: 117
I have a data frame and in some of the rows, for one of the columns, I have a 1D array. for example:
how I can count the number of values in the arrays in column data (separated by comma) for each row and show the number of them in a new column of new data frame same as fig 2:
Upvotes: 1
Views: 1752
Reputation: 23217
You can use .str.len()
to get the item count in lists in column data
and then use .groupby()
to aggregate the count of same name
using .sum()
, as follows:
df_out = (df['data'].str.len()
.groupby(df['name'], sort=False).sum()
).to_frame(name='data_count').reset_index()
Result:
print(df_out)
name data_count
0 john 6
1 amanda 0
2 sara 5
If the column data
consists of strings looking like arrays/lists, instead of the 1D array as mentioned in the question, you can run the following code to convert the column into real arrays/lists first:
df['data'] = df['data'].str.strip('[]').str.replace("'", "").str.replace('"', '').replace('', np.nan).str.split(',').fillna({i: [] for i in df.index})
Test Data Setup
nan = np.nan
# dict of dataframe dump by df.to_dict() as provided by OP in the comment:
data = {'name': {0: 'john', 1: 'amanda', 2: 'sara', 3: 'john'}, 'data': {0: '[a4G, bweQ, fp_dE4]', 1: nan, 2: '[H2dw45, IfC4, bAf23g, Lkfr54-op, a3dLa]', 3: '[Tr45b, kM30, riU91]'}}
df = pd.DataFrame(data)
df['data'] = df['data'].str.strip('[]').str.replace("'", "").str.replace('"', '').replace('', np.nan).str.split(',').fillna({i: [] for i in df.index})
Run solution codes
df_out = (df['data'].str.len()
.groupby(df['name'], sort=False).sum()
).to_frame(name='data_count').reset_index()
Result:
print(df_out)
name data_count
0 john 6
1 amanda 0
2 sara 5
Upvotes: 1