Reputation: 335
We have cells in format 'aa-bb-cccccccc/yyyy/[email protected]', where xx are the captions in format (A0-Z9) and yyyy are the numbers in format (0001-9999). We want to find the mean of yyyy for cells, which have xx='X1'. The table is huge, so iterating over each cell seems problematic. Can I use here somehow function filter(like='X1'), I have troubles finding a way to extract yyyy where this condition is satisfied.
import pandas as pd
df = pd.read_csv("dane.csv", sep=';')
shape = list(df.shape)
for i in range(0,shape[1]):
if df.iloc[:,i].filter(like='X1')
Example of input, the first row are column names.
1: Brandenburg (Post-Panamax) 2: Acheron (Feeder)
ES-NL-10633096/1938/[email protected]/6749 DE-JP-20438082/2066/[email protected]/68849
BE-BR-61613986/3551/[email protected]/39927 NL-LK-02275406/2136/[email protected]/73198
PH-SA-39552610/2436/[email protected]/80578
PA-AE-59814691/4881/[email protected]/25247 OM-PH-31303222/3671/[email protected]/52408
Upvotes: 0
Views: 34
Reputation: 30991
As I understand, you want to compute one mean value, for all cells meeting the criterion, regardless of the column where it is.
Start from import re.
Then compile the regex:
pat = re.compile(r'(?:[A-Z]{2}-){2}\d+/(?P<n>\d+)/X1@[A-Z]+\.[A-Z]{2}',
flags=re.I)
And the result can be computed with the following one-liner:
pd.Series(df.values.reshape(-1, 1).squeeze()).str.extract(pat)\
.dropna().astype(int).n.mean()
For your sample data the result is 3409.5.
To get acquainted with the way the computation is done, execute each step as a separate instruction and watch the results.
Upvotes: 1