C90/DMF - Collected Slides |
Slide 1
1 Title C90/DMF - Title
National Energy Research Scientific Computer Center
rk@owen.sj.ca.us
rkowen@nersc.gov
or
Slide 2
1.1 Abstract C90/DMF - Abstract
``Providing Extra Service: What To Do With Migrated User Files?''
When faced with decommissioning our popular C90 machine, there was a problem of what to do with the 80,000 migrated files in 40,000 directories and the users' non-migrated files. Special software and scripts were written to store the file inode data, parse the DMF database, and interact with the tape storage system. Design issues, problems to overcome, boundaries to cross, and the hard reality of experience will be discussed.
Slide 3
2 Introduction C90/DMF - Introduction
- Cray Y-MP C90 / 16 PEs / 1 GW / UNICOS 9.2
- Used continuously 5+ years
- 1450 users, up to 3 GB superhomes each
- Home directories:
- 60 GB of on-line storage
- 60 TB of off-line storage (migrated files)
- Active until last day (Dec 31, 1998)
- 7 days to clean & pack machine
Slide 4 What can we do or what do we tell the users?
2.1 Options C90/DMF - Options
- Do a full back-up
Not enough:
- On-line disk space for all the user files
- Time to transfer off-line files to disk
- Tell users to back-up to tape storage themselves
(take no action)
- Same problem as above
- Migrated file thrashing
- Garner bad will
- Migrated files already in tape storage!
- Rename them?
- Many technical issues to resolve
Slide 5 The overwhelming challenges to master:
3 The Challenge C90/DMF - The Challenge
- Interface between 2 groups
- One for the C90
- Other for the tape storage system
- Understand UNICOS file systems and migration
- Understand HPSS (tape storage system)
- Portable to at least one other machine
- 2 month development cycle
- Data collection must work 1st time, no second chance
Slide 6 The UNICOS File System & Migrated files
3.1 UNICOS file systems C90/DMF - UNICOS and Migrated Files
Slide 7 DMF handles migrated files - placing or retrieval
3.2 DMF database C90/DMF - DMF database
- Tracks migrated file status
- Duplicates some inode info
- Contains location in Tape Storage System
- In DB:
/testsys/migration_dmf/ama_migrate/3381d338_8/000000408055- In HPSS:
/DMF/ama_migrate/migration_dmf/3381d338_8/00000040805
Slide 8
3.3 HPSS C90/DMF - HPSS
- HPSS - High Performance Storage System
- Hides tape storage details from users
- Looks like hierarchal directory structure
- Access with ftp or hsi
Slide 9 Nothing that time and money can't solve.
4 The Solution C90/DMF - The Solution
- Lots of test codes
- Understand the inode info (struct stat)
- How to extract info from DMFdb
- How to interface with HPSS
- Integrate pieces
- Test and verify
- Plan for contingencies
Slide 10
4.1 Saving inode info C90/DMF - Saving inode Information
- Mirror user directory structure
- Only consider migrated files
- Check DMF dm_mode
- Roll inode struct stat in another struct
- DMF dm_id dm_key
- Add file name
- Store info to file in mirrored directory
- Portably read file from any Cray system
Slide 11 DMF Database
4.2 Accessing DMFdb C90/DMF - Accessing DMFdb
- Direct access
- too difficult
- No vendor API libraries
- Real-time access - not needed
- Dump and Read
- Use /usr/lib/dm/dmdbase -t dmdb
- Portable and independent of machine
- Easily read & stored into GNU gdbm
Slide 12
4.3 Interfacing HPSS C90/DMF - Interface with HPSS
- hack PD ftp code and patch in
- Handling all exceptional cases
- Can't test all contingencies
- Not flexible
- Not enough time for robust code
- fork/exec ftp/hsi session
- Buffering is a problem
- No immediate feed-back
- Need hacked ftp to force line buffering
- Else use pseudo-tty mechanism
- Use expect (Tcl add-on)
- Designed for fronting interactive sessions
- Uses pseudo_tty
- Known scripting language & heavily documented
- ``Talks'' between two interactive programs
Slide 13
5 The Reckoning C90/DMF - The Reckoning
- Time was coming to an end
- Last minute design requirements
- Had to handle non-migrated files
- Politics
- Check-list
Slide 14 As with most projects there are always last minute design requirements.
5.1 Last Minute Requirements C90/DMF - Last Minute Requirements
No longer had spare disks for users non-migrated files
Needed script to:
- cd to each user's home directory
- tabulate non-migrated files with GNU find
- pipe into cpio
- rm tabulated files
- create README, append list of files
- force migrate archive and README
Slide 15 Can't rely on memory ... make a check-list
5.2 Check-List C90/DMF - Check-List
- Check resources & privileges
- Copy $HOME environment somewhere safe
- Compile & verify tools & scripts
- Disable auto-migration
- Archive non-migrated files & migrate
- Create inode data & mirrored directories
- Trim empty data directories
- Dump DMFdb to GNU gdbm
- Save data elsewhere for safety
- Start expect script to talk to HPSS
- Monitor and correct as needed
- Clean-up & notify system staff
- Celebrate!
Slide 16 Zero Hour:
6 Juncture C90/DMF - Juncture
- Follow check-list
- Resolve problems
- Task took 3 full days
- Finished 1 hour after expected
Slide 17 Problems
7 And Hard Experience C90/DMF - And Hard Experience
- File names with non-standard characters
- Trimming the empty directories
- Sleep deprivation
Slide 18
8 Conclusion C90/DMF - Conclusion
- The task was successful - too successful
- 1450 User $HOME Directories
- 60 GB on-line data
- 60 TB off-line data
- 41,200 directories traversed
- 85,000 migrate files
- Handled 1250 problem files later
- Could easily trim empty data directories in inode pass
- Should have full access to system
- Requires a wide skill set
- Programming & scripting
- Quick skill acquisition
- Negotiator
- Understand the BIG picture
- Break into manageable pieces
- Ask for hazard pay!