HelloToEarth
HelloToEarth

Reputation: 2117

Adjusting dataframe columns in Pandas

I have a dataframe output that looks like this:

Index      Region    Date
0       W S CENTRAL  Sep 2018
1               388          
0          MOUNTAIN  Sep 2018
1               229          
0           PACIFIC  Sep 2018
1               145 

I would like to put each iteration of numerical value underneath every Region, ie: 388 under Region, and place it in a new column right beside the Region column called Total.

The data begins in .txt form and reads into the script as a list within a list like:

[[W S CENTRAL, 388], [MOUNTAIN, 229], [PACIFIC, 145]]

I'd like my output to be:

Region      Total      Date
WS CENTRAL  388       Sep 2018
MOUNTAIN    229       Sep 2018
PACIFIC     145       Sep 2018

So then I can groupby() the Date for each region.

Code for the parse of lists into the dataframe is:

def join_words(n):
    frames = list()

    for listy in n:
        grouper = groupby(listy, key=str.isalpha)
        joins = [[' '.join(v)] if alpha_flag else list(v) for alpha_flag, v in grouper]
        res = list(chain.from_iterable(joins))
        df = pd.DataFrame(res, columns = ['Region'])
        df['Date'] = os.path.split(file)[-1]
        frames.append(df)

    new_df = pd.concat(frames)
    return new_df

The issue arises when changing the res variable into a dataframe; as res prints as the list version of what I want as an output. The grouper and joins variables are used to pass through strings next to one another and join them into a single string (for country name purposes).

Upvotes: 0

Views: 34

Answers (1)

asimo
asimo

Reputation: 2500

You can use the shift function in your case. (looking at how your dataframe looks like)

df['Total'] = df['Region'].shift(-1)

df = df[df.index %2 == 0]
order = [0,2,1]
df = df[df.columns[order]]

Upvotes: 1

Related Questions