CVS log for pvfs2/src/io/dev/pint-dev.c |
|
Help |
Request diff between arbitrary revisions / Display revisions graphically
merge change for munlock with no ifdef
munlock memory that was hiding in an ifdef
merge with Orange-Branch
remove extraneous debugging ----------------------------------------------------------------------
various memory related changes. mlock bufmap pages, mark pages that are kmap'd and modified as dirty, add locking around op initialization for safety. these changes need testing on 2.4 and 2.6 kernels to make sure they behave as expected
Committed Windows client code to Orange-Branch.
Porting job files
Merged in changes from Orange-Branch. There were a number of bugs fixed there since this branch was created.
merging Orange Branch changes in
initial merge with Orange-Branch. much will be broken
Removed duplicate reference to gossip_debug_mask in pint-dev.c. Added prototypes for new functions in pint-event.c and pint-mem.c
Merged failover with tree code. Modified Files: Tag: Orange-Branch prepare src/apps/kernel/linux/pvfs2-client-core.c src/client/sysint/client-state-machine.c src/client/sysint/client-state-machine.h src/client/sysint/finalize.c src/client/sysint/sys-getattr.sm src/client/sysint/sys-io.sm src/client/sysint/sys-small-io.sm src/common/gossip/gossip.h src/common/misc/msgpairarray.sm src/common/misc/pint-event.c src/common/misc/pint-event.h src/io/bmi/bmi.c src/io/dev/pint-dev.c src/io/job/job.c src/kernel/linux-2.6/devpvfs2-req.c src/kernel/linux-2.6/file.c src/kernel/linux-2.6/pvfs2-mod.c src/proto/PINT-le-bytefield.c src/proto/pvfs2-req-proto.h src/server/create-immutable-copies.sm src/server/get-attr.sm src/server/small-io.sm src/server/tree-communicate.sm src/server/request-scheduler/request-scheduler.c src/server/request-scheduler/request-scheduler.h
When the kernel module is loaded with a debug mask parameter, the code accepts the parameter into a 32-bit variable, which is then cast into a 64-bit parameter. The insmod command does not handle 64-bit input parameters, but the kernel gossip_debug_mask should be a 64-bit value to acccomodate the existing gossip-debug functions. Modified Files: src/common/gossip/gossip.h src/io/dev/pint-dev.c src/kernel/linux-2.6/devpvfs2-req.c src/kernel/linux-2.6/pvfs2-mod.c
fix some compile errors on Fedora 11 (Tinderbox Stan build)
Added functionality to the kernel module and client-core allowing the gossip debug masks for either the client or the kernel to be modified AFTER the client-core is started. Modified Files: include/pvfs2-debug.h src/apps/kernel/linux/pvfs2-client-core.c src/common/misc/pvfs2-debug.c src/io/dev/pint-dev.c src/io/dev/pint-dev.h src/kernel/linux-2.6/dcache.c src/kernel/linux-2.6/devpvfs2-req.c src/kernel/linux-2.6/pvfs2-dev-proto.h src/kernel/linux-2.6/pvfs2-mod.c src/kernel/linux-2.6/pvfs2-proc.c src/kernel/linux-2.6/pvfs2-utils.c src/kernel/linux-2.6/upcall.h
These changes give us the ability to modify the gossip-debug-mask for the client-core or the kernel module dynamically via the proc system. The following files were modified: include/pvfs2-debug.h src/apps/kernel/linux/pvfs2-client-core.c src/common/misc/pvfs2-debug.c src/io/dev/pint-dev.c src/io/dev/pint-dev.h src/kernel/linux-2.6/dcache.c src/kernel/linux-2.6/devpvfs2-req.c src/kernel/linux-2.6/pvfs2-dev-proto.h src/kernel/linux-2.6/pvfs2-mod.c src/kernel/linux-2.6/pvfs2-proc.c src/kernel/linux-2.6/pvfs2-utils.c src/kernel/linux-2.6/upcall.h
merged in changes from summer at LANL
committing patch contributed by Bart Taylor to fix buffer corruption under heavy memory load on RHEL3 kernels.
updated all references in include/, src/client/, src/common/, src/io/, src/proto/, and src/server/ to use the new PVFS_credential in place of the old PVFS_credentials. the admin apps in particular need to be updated to use the new API.
Reverse merged and ported to HEAD.
merging job/dev bug fix from trunk
merging job/dev bug fix from trunk
Fixed a bug in handling unexpected device jobs. With a big enough load of concurrent operations (particularly with threaded pvfs2-client) could cause client-core assertion, and would likely happen repeatedly when client-core restarted.
Sam's prelude and scheduler updates
separate user space device debugging mask from kernel space device debugging mask; take the former out of the "verbose" mask so that it doesn't continuously clutter logging output https://trac.mcs.anl.gov/projects/pvfs/ticket/1
[on behalf of Phil]: pvfs2-aio-cancel.patch ---------------------- This patch fixes a bug in the I/O cleanup path on the server side. In cases where a flow needed to cancel pending I/O operations, the trove cancel function was calling aio_cancel() directly. This doesn't work correctly if the alt-aio implementation is used. pvfs2-root-squash-address.patch ------------------------------- This fixes a bug in the root squash checking on the server side. The routine that compares a client address against the root squash list was using getsockname() rather than getpeername(). The former retrieves the server's address rather than the client's. pvfs2-ls-rm.patch ----------------- This is an interim fix for the concurrent "rm -rf" and "ls" problem that was recently discussed on the mailing list. It sounds like the long term direction is to switch to using entry names as dirent tokens, but this patch fixes the majority of cases in the mean time without a protocol change. The problem in the case I was seeing was a cache conflict between the two clients (the ls was caching tokens in the pcache that caused rm to get the wrong position). The token is 64 bits wide, but only the first 32 bits are used (the START and END values are near the top of the 32 bit range). This patch takes advantage of the extra top 32 bits on the server side to set a unique identifier in the token for each "readdir session" so that their cache entries do not collide. The client is not aware of this change because it treats the token as an opaque value. A readdir session begins when a client requests the START position. pvfs2-client-buffer-logging.patch --------------------------------- I don't know if there is any interest in this, but this adds some debugging to the buffers used in the kernel module. On startup, pvfs2-client will print the buffer pointers (whether debugging is enabled or not). There are also new debugging messages that will show the first byte of each memory buffer passing through the kernel if enabled. These logging messages were added to help track down what ended up being a server side problem (see pvfs2-aio-cancel.patch), but we kept it in case it is useful in the future.
Machines without ioctl will complain on the uses of _IOR and so forth, as well as ioctl, poll calls. The code to implement getting requests through a character device runs through about 12 files in the source tree. Rather than hacking out all the linux-kmod-related bits, selectively disable just the main commands but leave it all harmlessly compiled in. A new configure variable tells if a linux kernel module was requested.
Merge HEAD changes to TAS-branch.
merge of the WALT3 branch to HEAD. This patch changes the way state actions are represented as C structures (what statecomp generates). It also changes the main state action parameter from a s_op on the server or an sm_p on the client to a unified smcb pointer (state machine control block) for all state actions (both client and server). Finally, initial support for concurrent state machines has been added to allow state actions to be invoked concurrently .. a first step for server-to-server.
walt3 reverse merge from head includes merges of pw's sm changes (no state declarations), cleanup of state machine code, and other general merging/fixes.
Update migration branch to current CVS version
Sync hint-branch to current CVS version
merge 2.6 branch changes to head
Synchronization with HEAD
merge of murali's kernel buffer size tuning options to HEAD.
murali's patch to allow tuning kernel buffer settings from client-core.
Upgrade to current CVS version
Merged from trunk: added standalone option to client-core. Change fprintf(stderr, ...) to gossip_err since stderr gets reopened to /dev/null. Fixed smbp alloc bugs and corrected a few state machine action functions. Get the client context stuff right in client-core. Changed sys-lookup.sm to allocate lookup contexts on demand, instead of all at once. This should save on the client_sm allocation.
reverse merge of HEAD to WALT3 branch.
print exit for idle case
added some debug statements for client device testing
backmerging of HEAD to branch...
Merge HEAD into Walt's branch. Rework new state machines to the new cleanups introduced by Walt.
Merge posix-extensions-branch to HEAD This branch implemented patches to the 2.6.16 kernel for the proposed POSIX I/O extensions and those patches are under the patches subdirectory. It also implements the PVFS2 specific hooks for these system calls. Tools that may be of immediate use to the pvfs2 general audience is the pvfs2-lsplus utility in src/apps/admin that should be noticeably faster than the pvfs2-ls utility if there are a lot of objects in a single directory. Other features are left out by configure and are not even built if the kernels do not support those features and callbacks.
Reverse merges from HEAD..
reverse merge from trunk. working for now.
Gossip'ized kmod as well so that we dont have 2 separate calls to printing diagnostics. gossip for the kmod is fairly primitive and is handled simply by means of a macro in gossip.h. Replaced and removed pvfs2_print and pvfs2_error with gossip_debug. Edited quickstart to include comments on the new kmod debug parameters as well.
revert style changes back to previous versions.
run maint/pvfs2indent-80col.sh on all .c files to correct style :)
merges fixes that went into head for consolidating I/O paths (read, write) and (readv, writev) Also contains the hooks for implementing readx, writex natively in PVFS2. (the vfs patch is part of the patches sub directory). This involved adding the following infrastructures a) Have a way of sending a variable length trailer as part of the upcalling mechanism. Currently the readx, writex implementation upcalls the file offsets, length pairs as this variable length trailer. b) Client-core has a new upcall request (called FILE_IOX: very unimaginative name :)) which now does a isys_io with file_req set to req_hindexed as opposed to contiguous for FILE_IO upcalls. c) Client-core does a read of the upcall from the device, if it finds that trailer is present, it does another read from the device. d) kmod which maintains the pvfs2_kernel_op_t structures in the request list needs to keep the structure around in the list (and not insert in the in-progress hash table) if the op has an upcall trailer. kmod has hooks for feeding either the op or the trailer depending on the which read call comes into the device. It also has hooks for delaying adding the op to the hash table until the trailer is picked off the device. e) Added a bunch of test programs for the new readx/writex system calls.. Seems to pass tests on an x86 laptop! Also contains fixes and cleanups of the readdir/readdirplus implementattion which required a variable length trailer as part of the downcall. These cleanups enable a very symmetrical implementation of variable length upcall and downcalls!
Includes reverse merges from trunk in addition, it includes fixes for readdir and readdirplus to have a variable length number of directory entries passed in from client-core using a trailer page (currently). However, there is nothing that prevents us from sharing those pages with the kernel using vmap and friends which is the next logical step. So client-core now does a readdir/readdirplus and writev's the trailer page (which is essentially an encoded version of the readdir/readdirplus response) to the kmod. The kernel module decodes that and copies it to the user-space app (which issued the getdents/getdents_plus system call) Added/edited the getdents.c test program in the test/posix sub directory to issue getdents/getdents64/getdents_plus/getdents64_plus system calls Also fixed a critical bug that was somehow never triggered in readdir if the buffer size provided by glibc/user was not sufficient enough and we advance the f_pos token beyond where we stopped.
commit of murali's 64bit fixes.
[pcarns]: add a protocol version to the pvfs2 device communication. ensures clients and the kernel module come from the same source tree, but also has the pleasing side effect of 8-byte aligning access to the header of the request.
- misc cleanups
- replaced all vfs 64 bit tags to be unsigned - replaced pint-dev code to work with 64 bit unsigned tags - moved all op initialization out of the constructor and into the op_alloc routine - fixed tag cancellation upcall/method to use a 64 bit tag, rather than an unsigned long - some cleanups
- print how many bytes were read on actual short reads - translate non-errno error codes (such as cancellation) in the kernel code to avoid returning completely bunk error values
- fixed bug in pint-dev's test method that didn't clear the outcount on error. this caused the pvfs2-client-core to get short read errors, which were handled gracefully (by reposting the dev unexp), but it's better if it doesn't see them at all in that case (since there really was nothing read). - started adding some debugging hooks for printing server response types (i only did getattr for now since that's the one that appears to be causing trouble) - misc cleanups
- applied Nathan's expandtab patch (expandtab-2.patch.gz) first referenced at: http://www.beowulf-underground.org/pipermail/pvfs2-developers/2004-July/000745.html
- remove unused header
- fix the op tags used from the vfs by going 64bit all the way, instead of relying on unreliable casts and assumptions (murali had a working prototype -- this is a similar idea but not based on it) - use the Ld macros in the pvfs2-client-core where appropriate - some cleanups
- merging in the pvfs2-nm-nb-branch with the main tree see ChangeLog for details, or browse the cvs history of the branch for full details
- on device reads, if the max_idle_time is 0, don't even call poll, just do a read on the device and handle unexp requests, or exit if nothing is there and ready
- check for poll errors on the device file
- terminate the device testing thread on poll error (as the device can no
longer be read properly), or return an error in non-threaded mode
[ this is only really useful for graceful shutdown of the threaded
client library ]
fix some minor warnings (branch)
NOTE THAT THIS IS A BRANCH COMMIT (tagged as nm-nb-branch). Feel free to ignore it completey as it's a snapshot of a work in progress and it will crash your computer and reformat your hard drive. This is almost the initial draft of the pvfs2-client-core that operates in a non-blocking manner. While it runs, there are still issues that need to be resolved (that are keeping it out of the main CVS trunk). Many other changes were made along the way, so it's more than just that. - added compile time option for disabling thread-safety in the client library (enabled by default; --disable-thread-safety to disable) - improved configure summary information emitted at configure time - added missing non-blocking sysint declarations to sysint header - re-wrote pvfs2-client-core to use sysint non-blocking operations where possible - made sysint test and testsome() calls more useable from a user point of view - merged dev unexp polling/handling with system interface - added PINT_sys_dev_unexp call that allows posting unexpected device messages so that they can be returned from the sysint testsome method in addition to completed sysint operations - many memory leaks fixed -- many more to go (started adding macros for freeing the server response objects) - added a id_gen_fast_unregister macro that is a no op, to make the api more consistent with the id_gen_safe_* calls - server-config-mgr: report mutex still in use if it is, but also make sure not to unlock an already unlocked mutex (valgrind complains) - many assertions added - many formatting changes - many ptr assignments to NULL after freeing in the job interface; done while tracking down a problem - modified the pint-dev device interface to make sure it can handle the pvfs2 device in a non-blocking manner - pint-dev was using buffers larger than it needed across the device; fixed them to be the right size - added a method to free the memory region mapped into the kernel through the device - freed that mapped memory region on pvfs2-client-core shutdown (valgrind complained) - modified device driver to work properly in non-blocking mode from userspace by implementing the character device poll method - modified pint-dev test function to properly handle non-blocking responses from the device driver - modified PINT_flow_reset to not allocate a new mutex unless the old one was destroyed - modified all job uses of the id-generator to use the safe, rather than fast, methods (useful for several reasons including safe cancellation) - modified cancellation methods to be able to handle ops that have already completed - modified the job_dev_unexp method to have (and honor) the no immediate completion flag if passed (used in the pvfs2-client-core) - modified the job completion callbacks to make sure to NOT add a completed job desc to the completion queue if it's already been added (by checking a flag, not scanning). this is a safety and should only be used when a non-thread-safe client library is being used in a thread safe env, but we should handle it gracefully anyway - replaced all kernel allocations of ops through the slab allocator handled op_cache to be replaced by a wrapper method (op_alloc(), as suggested by Murali) -- the other cache allocations will probably be replaced later. this allows the removal of the extern op_cache declaration - added macros for freeing some of the most heavily used server response messages (readdir, lookup, getattr) -- the others are coming later - freed dirents coming out of sysint response object in the pvfs2-client
this fixes 3 separate things reported by robl and phil. this changelog lists
phil's description and a description of the fix:
---
1) This one isn't too bad, but if I run this:
./pvfs2-client -p pvfs2-client-core
instead of this:
./pvfs2-client -p ./pvfs2-client-core
... the output looks the same either way (ie, it appears successful).
In the former case, both the client and client-core actually exited,
because i specified the path to the core incorrectly. It would be nice
if it printed something in this case to avoid confusion.
---
Ok, done...this is tricky because if the client-core is in the same dir as
the client program, a 'stat' will pass, but the execvp may fail if that dir
isn't in the system PATH. Now, we print an error if the path is not absolute
and the client core exits.
---
2) I don't know if this happens everywhere or not, but I am running on a
box here where bad things happen if I get the hostname wrong on the
mount command, for example doing something like this:
mount -t pvfs2 tcp://a43:0/pvfs2-fs /mnt/pvfs2
(note that the port is wrong). Sorry I didn't catch this when you asked
me to try this stuff out the other day :( At any rate, the mount command
segfaults, and I get this in dmesg:
Attempting PVFS2 Mount via host tcp://a43:0/pvfs2-fs
Got an unknown pvfs2 error code: -1073741967
pvfs2_fs_mount: got return value of -1073741967
Unable to handle kernel paging request at virtual address 733d4853
printing eip:
c016b062
*pde = 00000000
---
This is due to an error code translation problem. Previously, the kernel
code didn't translate all pvfs_errno codes to actual errno codes, so I've
re-worked the error code translation functions so that they can be shared
by the kernel and the user space code.
---
3) This one is kinda nasty, and I'm not sure what's going wrong yet.
The scenario is that I have successfully mounted, but then on the next
operation (a getattr), the communication fails causing the msgpair.sm to
call exit(1). The pvfs2-client and client-core then exit. I start them
back up, and nothing ever works again. I get "Input/output error" any
time I try to access pvfs2 from then on, and I get "can't write
superblock" when I try to unmount. I have to reboot the box to get back
to sanity.
---
This turned out to be a beast of a problem now that we have dynamic mounts on
the client side. The issue is that on cancelled i/o, or on pvfs2-client-core
restart, there's *no* way the kernel can associate upcalls with the client-core
anymore because the dynamic mount information is lost forever as far as the
client-core is concerned.
The solution I chose for this is to have the pvfs2-client-core issue an ioctl
that causes the kernel to do mount upcalls for every pvfs2 file system it knows
about to avoid having to have the user manually issue another mount command
(or something goofy like that). Other solutions that I considered and didn't
like are 1) having the pvfs2-client-core store mount info in a file somewhere
and have it remount those on startup, or 2) having the pvfs2-client and
pvfs2-client-core have some ipc going on so that the pvfs2-client can store
the mount info and somehow give it back to the pvfs2-client-core on restart.
both of those are shoddy at best, and do not seem robust at all (i.e.
stale file info will error out on stale remount attempts, ipc requires that
the pvfs2-client never dies along with the pvfs2-client-core, etc).
With the current solution, we have all the information that the kernel knows
about (and so it's more reliable than a file or something), and it's the most
natural way to hand the information up from the kernel (since we have to do
this via upcalls anyway). Thus, the kernel code now keeps a list of pvfs2
superblocks and we store the info given to the sb at mount time so that it
can dynamically be remounted at any time. The pvfs2-client-core now issues
an ioctl (in a thread so that it can still service the request) on startup
to 'remount' any file systems the kernel knows about. while I'm calling this
a remount, it's basically just a mechanism for the kernel to tell the client
core enough information ot rebuild the dynamic mount tables in the system
interface.
because of this, the pvfs2-client-core program *requires* pthreads. This is
completely independent of whether or not the PVFS2 sysint needs pthreads
at all.
combo PVFS_id_gen_t -> PVFS_BMI_addr_t, formatting, PVFS error code patch. Ugly. Sorry if my formatting pisses someone off; at least I'm not using > 80 columns any more :).
small device endianness fix
some 'strict' warning removals
interesting 64-bit related changes
a more indexified interface to the shared kernel/userspace memory region used through the device file
user level hooks for getting kernel/user shared buffer
Renamed id_gen_t to PVFS_id_gen_t and moved it definition into pvfs2-types.h. Also took the #include for id-generator.h out of header files and into .c files where possible.
changed #include <stdint.h> to #include <inttypes.h> (slightly more portable)
filled in the write functions and tested
filled in write function; untested
Got rid of PINT_dev_write() function and replaced with a macro that just calls PINT_dev_write_list(), trivially implemented memalloc and memfree functions
improved error handling
tested the test_unexpected() function; seems to work fine
implemented test_unexpected and release_unexpected; untested
added ioctls to read some startup parameters from device; added a header file with some information that needs to be shared between kernel and user level
added code to create or fix /dev entries as needed; lifted from old pvfsd code
just checking in a little bit of progress
added code stubs for pvfs2-kernel device interface
| Email pvfs2 viewcvs admin |
Powered by ViewCVS 0.9.4 |