Vedant Ghavate
Vedant Ghavate

Reputation: 13

Spilt items in a row to columns through pandas

I need to splits items in a row to columns on the occurence of '^' character. This needs to done through pandas without a loop preferably

I have

0   A^
3   206-1C
4   502-2B
5   506-0.5C
6   604-1B
7   907-2B
8   G.ELITE^
9   A201-1C
10  A202-1B

I want

A^ G.ELITE^
206-1C A201-1C.
502-2B. A202-1B
506-0.5C
604-1B
907-2B.
test=rd.query('texts.str.contains("\^")', engine='python')

Upvotes: 0

Views: 80

Answers (3)

Soudipta Dutta
Soudipta Dutta

Reputation: 2152

import pandas as pd
import pandas as pd
import janitor
import numpy as np
# Sample DataFrame
df = pd.DataFrame({'col': ['A^', '3', '206-1C', '502-2B', '506-0.5C', '604-1B', '907-2B', 'G.ELITE^', 'A201-1C', 'A202-1B']})
"""print(df)
        col
0        A^
1         3
2    206-1C
3    502-2B
4  506-0.5C
5    604-1B
6    907-2B
7  G.ELITE^
8   A201-1C
9   A202-1B"""

sep='^'
fill_value = np.nan
end_with_sep = df.iloc[:,0].str.endswith(sep)
"""print(end_with_sep)
0     True
1    False
2    False
3    False
4    False
5    False
6    False
7     True
8    False
9    False"""
values_ending_with_caret = df[end_with_sep].iloc[:, 0].tolist()
"""print(values_ending_with_caret )
['A^', 'G.ELITE^']"""

#split_points = end_with_sep.cumsum()[end_with_sep].index.tolist() + [len(df)]

split_points = list(end_with_sep.index[np.where(end_with_sep)[0]]) + [len(df)]

print(split_points)#[0, 7, 10]
split_points1 = split_points[:-1]
print(split_points1)#[0, 7]
split_points2 = split_points[1:]
print(split_points2)#[7, 10]

dfs = []
for i,j in zip(split_points1,split_points2):
    # start from i+1 to exclude the header row
    aa = df.iloc[i +1:j].reset_index(drop=True)
    """print(aa)
            col
       0     3
       1    206-1C
       2    502-2B
       3  506-0.5C
       4    604-1B
       5    907-2B"""
    dfs.append(aa)
    #print(dfs)
    
res = pd.concat(dfs,axis=1)
"""print(res) 
        col      col
0         3  A201-1C
1    206-1C  A202-1B
2    502-2B      NaN
3  506-0.5C      NaN
4    604-1B      NaN
5    907-2B      NaN"""   

res.columns = values_ending_with_caret
"""print(res)
        A^   G.ELITE^
0         3   A201-1C
1    206-1C   A202-1B
2    502-2B      NaN
3  506-0.5C      NaN
4    604-1B      NaN
5    907-2B      NaN
"""

Upvotes: 0

Nick
Nick

Reputation: 147216

One way to do this is to split the dataframe where the value ends in ^ and then join those split dataframes horizontally, setting the column names from the splitting values:

# find values ending with ^
m = df.iloc[:, 0].str.endswith('^')

# split the dataframe at those points
s = list(np.nonzero(m)[0]) + [len(df)]
dfs = [df.iloc[s[i]+1:s[i+1]].reset_index(drop=True) for i in range(len(s)-1)]

# append the split frame as columns
out = pd.concat(dfs, axis=1).fillna('')

# and set the column names
out.columns = df[m].iloc[:, 0].to_list()

Assumed sample data:

df = pd.DataFrame({'value': ['A^', '206-1C', '502-2B', '506-0.5C', '604-1B', '907-2B', 'G.ELITE^', 'A201-1C', 'A202-1B']})

Output:

         A^ G.ELITE^
0    206-1C  A201-1C
1    502-2B  A202-1B
2  506-0.5C
3    604-1B
4    907-2B

Upvotes: 0

mozway
mozway

Reputation: 262284

You can try a pivot based approach:

# get column of interest
s = df.iloc[:, 0]

# identify strings ending in "^"
m = s.str.endswith('^')

# reshape
out = (s
  .to_frame(name='value')
  .assign(col=s.where(m).ffill(),
          index=s.groupby(m.cumsum()).cumcount())[~m]
  .pivot(index='index', columns='col', values='value')
)

Output:

col          A^ G.ELITE^
index                   
1        206-1C  A201-1C
2        502-2B  A202-1B
3      506-0.5C      NaN
4        604-1B      NaN
5        907-2B      NaN

Upvotes: 2

Related Questions