weakincodingabc
weakincodingabc

Reputation: 83

Extract column value based on another column, reading multiple files

I will like to extract out the values based on another on Name,Grade,School,Class. For example if I were to find Name and Grade, I would like to go through column 0 and find the value in the next few column, but the value is scattered(to be extracted) around the next column. Same goes for School and Class.

Refer to this: extract column value based on another column pandas dataframe

I have multiple files:

      0      1     2     3      4     5    6    7     8
0    nan    nan  nan  Student Registration   nan    nan
1    Name:   nan  nan   John   nan   nan  nan  nan   nan
2    Grade:  nan   6    nan    nan   nan  nan  nan   nan
3     nan    nan  nan  School: C College  nan  Class: 1A               

      0      1     2     3      4     5    6    7     8    9
0    nan    nan   nan  Student Registration     nan  nan  nan
1     nan    nan  nan    nan   nan    nan  nan   nan nan  nan
2    Name: Mary   nan    nan   nan    nan  nan   nan nan nan
3    Grade:   7   nan    nan   nan    nan  nan   nan nan nan
4     nan   nan   nan  School: nan   D College Class: nan 5A     

This is my code: (Error)

for file in files:
       df = pd.read_csv(file,header=0)
       df['Name'] = df.loc[df[0].isin('Name')[1,2,3]
       df['Grade'] = df.loc[df[0].isin('Grade')[1,2,3]
       df['School'] = df.loc[df[3].isin('School')[4,5]
       df['Class'] = df.loc[df[7].isin('Class')[8,9]
       d.append(df)
df = pd.concat(d,ignore_index=True)

This is the outcome I want: (Melt Function)

Name Grade  School    Class ... .... ... ... 
John  6    C College   1A
John  6    C College   1A
John  6    C College   1A
John  6    C College   1A
Mary  7    D College   5A
Mary  7    D College   5A
Mary  7    D College   5A
Mary  7    D College   5A

Upvotes: 1

Views: 415

Answers (1)

jezrael
jezrael

Reputation: 863431

I think here is possible use:

for file in files:
    df = pd.read_csv(file,header=0)
    #filter out first column and reshape - removed NaNs, convert to 1 column df
    df = df.iloc[1:].stack().reset_index(drop=True).to_frame('data')
    #compare by :
    m = df['data'].str.endswith(':', na=False)
    #shift values to new column
    df['val'] = df['data'].shift(-1)
    #filter and transpose
    df = df[m].set_index('data').T.rename_axis(None, axis=1)
    d.append(df)
df = pd.concat(d,ignore_index=True)

EDIT:

You can use:

for file in files:
    #if input are excel, change read_csv  to read_excel
    df = pd.read_excel(file, header=None)

    df['Name'] = df.loc[df[0].eq('Name:'), [1,2,3]].dropna(axis=1).squeeze()
    df['Grade'] = df.loc[df[0].eq('Grade:'), [1,2,3]].dropna(axis=1).squeeze()
    df['School'] = df.loc[df[3].eq('School:'), [4,5]].dropna(axis=1).squeeze()
    df['Class'] = df.loc[df[6].eq('Class:'), [7,8]].dropna(axis=1).squeeze()

    print (df)

Upvotes: 1

Related Questions