Reputation: 59
13-JUL-17
Bank User Space Occupied(GB)
------------------------------ ------------------
CKYC_MNSB .004211426
CORE_AMARNATH_ASP 8.75262451
CORE_AMBUJA 6.80389404
CORE_AMBUJA_ASP 10.0085449
CORE_ANAND_MERC_ASP 18.9866333
CORE_BALOTRA 17.8280029
CORE_BASODA 4.55432129
CORE_CHHAPI_ASP 11.9767456
CORE_DHANGDHRA_ASP 13.1849976
CORE_IDAR_ASP 13.3209229
CORE_JANTA_HALOL_ASP 12.7955933
Bank User Space Occupied(GB)
------------------------------ ------------------
CORE_JHALOD_URBAN_ASP 9.19219971
CORE_MANINAGAR 5.36090088
CORE_MANINAGAR_ASP 6.31414795
CORE_SANKHEDA 20.4329834
CORE_SMCB_ANAND_ASP 11.3191528
CORE_TARAPUR_ASP 8.24627686
CORE_VUCB .000610352
TBA_TEMP 5.39910889
TEST_DUNIA 4.15698242
20 rows selected.
TABLESPACE NAME Free Space in GB
------------------------------ ----------------
TBAPROJ 33.2736816
I have above text file.
How to store in CSV file with column separated?
I have load file but its very difficult to remove blank space from the file.
Upvotes: 1
Views: 84
Reputation: 94212
Each line you want matches the pattern of a word made from capital letters and underscores, then spaces, then a number that has a decimal point in it. so this grep
will filter those out:
> file_raw <- readLines('file.txt')
> read.table(
text=paste(
file_raw[
grep("^[A-Z_].*\\s*\\.",file_raw)
],
collapse="\n"),
sep="",head=FALSE)
V1 V2
1 CKYC_MNSB 0.004211426
2 CORE_AMARNATH_ASP 8.752624510
3 CORE_AMBUJA 6.803894040
4 CORE_AMBUJA_ASP 10.008544900
5 CORE_ANAND_MERC_ASP 18.986633300
6 CORE_BALOTRA 17.828002900
7 CORE_BASODA 4.554321290
8 CORE_CHHAPI_ASP 11.976745600
9 CORE_DHANGDHRA_ASP 13.184997600
10 CORE_IDAR_ASP 13.320922900
11 CORE_JANTA_HALOL_ASP 12.795593300
12 CORE_JHALOD_URBAN_ASP 9.192199710
13 CORE_MANINAGAR 5.360900880
14 CORE_MANINAGAR_ASP 6.314147950
15 CORE_SANKHEDA 20.432983400
16 CORE_SMCB_ANAND_ASP 11.319152800
17 CORE_TARAPUR_ASP 8.246276860
18 CORE_VUCB 0.000610352
19 TBA_TEMP 5.399108890
20 TEST_DUNIA 4.156982420
21 TBAPROJ 33.273681600
Note that if you are expecting any of the first tokens to not match the pattern, for example CORE_999
or lower_case
then you need to adjust the pattern. But without a formal spec we can only go on what you supplied.
Upvotes: 1
Reputation: 7023
There might be possibly a more elegant way, but this does the trick:
# read raw file in lines
file_raw <- readLines('file.txt')
# remove whitespace
file_trim <- trimws(file_raw,which = 'both')
# remove empty lines
file_trim <- file_trim[file_trim != '']
# sub white space with separator ,
file_csv <- gsub('\\s{2,}',',',file_trim)
In the end there will be still some things left like the --
separators and 20 rows selected.
, but that can be filtered out easily if you want, before writing or after reading it:
file_clean <- file_csv[!grepl('(-){3,}|rows selected',file_csv)]
write.csv(file_clean,'file_cleaned.csv')
> read.csv('file_cleaned.csv')
X x
1 1 13-JUL-17
2 2 Bank User,Space Occupied(GB)
3 3 CKYC_MNSB,.004211426
4 4 CORE_AMARNATH_ASP,8.75262451
5 5 CORE_AMBUJA,6.80389404
6 6 CORE_AMBUJA_ASP,10.0085449
7 7 CORE_ANAND_MERC_ASP,18.9866333
8 8 CORE_BALOTRA,17.8280029
9 9 CORE_BASODA,4.55432129
10 10 CORE_CHHAPI_ASP,11.9767456
11 11 CORE_DHANGDHRA_ASP,13.1849976
12 12 CORE_IDAR_ASP,13.3209229
13 13 CORE_JANTA_HALOL_ASP,12.7955933
14 14 Bank User,Space Occupied(GB)
15 15 CORE_JHALOD_URBAN_ASP,9.19219971
16 16 CORE_MANINAGAR,5.36090088
17 17 CORE_MANINAGAR_ASP,6.31414795
18 18 CORE_SANKHEDA,20.4329834
19 19 CORE_SMCB_ANAND_ASP,11.3191528
20 20 CORE_TARAPUR_ASP,8.24627686
21 21 CORE_VUCB,.000610352
22 22 TBA_TEMP,5.39910889
23 23 TEST_DUNIA,4.15698242
24 24 TABLESPACE NAME,Free Space in GB
25 25 TBAPROJ,33.2736816
Upvotes: 1