Closed Bug 493541 Opened 16 years ago Closed 16 years ago

jemalloc integration cause crashes when libraries or plugins dlopen with RTLD_DEEPBIND

Categories

(Core :: Memory Allocator, defect, P1)

All
Linux
defect

Tracking

()

RESOLVED FIXED
Tracking Status
status1.9.2 --- beta1-fixed
status1.9.1 --- .3-fixed

People

(Reporter: wolfiR, Assigned: karlt)

References

Details

(Keywords: crash, topcrash)

Attachments

(1 file, 2 obsolete files)

(This is related to bug 473428) Apparently the jemalloc integration can cause confusion for in process library functions if libraries are usually not using jemalloc but referencing malloc() and free() through the processes memory map. (Sorry I'm not a low level expert). Here is a bugreport mentioning two examples: https://bugzilla.novell.com/show_bug.cgi?id=503151 And an explanation I found is: https://bugzilla.novell.com/show_bug.cgi?id=477061#c11 which led me to this bugreport.
Severity: normal → critical
Flags: blocking1.9.1?
Blocks: 473428
According to https://bugzilla.novell.com/show_bug.cgi?id=503151#c5 this is nothing which need to be fixed from mozilla. Here's the explanation from the above comment: (and NSS is the glibc's Name Service Switch not Network Security Services) " We currently do not support custom malloc() implementation in NSS due to our patch to open NSS modules deep-bound (that is meant to protect the main process from library namespace pollution by libraries the NSS module depends on - e.g. Thunderbird depended on one kind of OpenLDAP library, while nss_ldap depended on an entirely incompatible one). This causes the main process to use the custom malloc(), but the NSS module to use the stock free(). "
Status: NEW → RESOLVED
Closed: 16 years ago
Resolution: --- → WONTFIX
Flags: blocking1.9.1? → blocking1.9.1-
I'm reopening this, because as well as the name service switch module loading issues (which show up as bug 473428, https://bugzilla.novell.com/show_bug.cgi?id=503151 and https://bugs.gentoo.org/show_bug.cgi?id=252302), the same issue is affecting the Flash plugin (bug 469439).
Assignee: nobody → mozbugz
Blocks: 469439
Status: RESOLVED → REOPENED
Resolution: WONTFIX → ---
Summary: jemalloc integration can cause crashes in certain environments → jemalloc integration cause crashes when libraries or plugins dlopen with RTLD_DEEPBIND
Version: 1.9.1 Branch → Trunk
Excepts from what Ulrich Drepper says about the RTLD_DEEPBIND flag he added: ("How To Write Shared Libraries", August 20, 2006, http://people.redhat.com/drepper/dsohowto.pdf) this feature should only be used if it cannot be avoided. There are several reasonse for this: The change in the scope affects all symbols and all the DSOs which are loaded. Some symbols might have to be interposed by definitions in the global scope which now will not happen. Already loaded DSOs are not affected which could cause unconsistent results depending on whether the DSO is already loaded (it might be dynamically loaded, so there is even a race condition). ... The RTLD_DEEPBIND flag should really only be used as a last resort. Fixing the application to not depend on the flag's functionality is the much better solution. The inconsistency that RTLD_DEEPBIND causes with jemalloc is that dynamic libraries opened with RTLD_DEEPBIND will use libc's malloc while libc is still using jemalloc. A libc function may return a pointer to something that should be passed to free, and the dynamic library will call libc's free, but libc used jemalloc to allocate the memory. I raised a question on this behavior here: http://sourceware.org/ml/libc-alpha/2009-06/msg00168.html But it looks like we can make libc's free (and malloc, etc) use jemalloc: http://www.gnu.org/s/libc/manual/html_node/Hooks-for-Malloc.html
Blocks: 479199
I wonder whether we ever build against glibc and expect to run against a different glibc. I hoping this will fix the bug but I'm not able to test right now. The jemalloc dependency in the build system is broken so OBJ-DIR/browser/app/firefox-bin must be explicitly removed to pick up the changes.
Comment on attachment 386244 [details] [diff] [review] hook jemalloc into glibc's malloc This doesn't work as glibc does not run__malloc_initialize_hook on free. (The assumption is probably that glibc's malloc or similar would have been called before free, but that's not happening here.)
Attachment #386244 - Attachment is obsolete: true
We shouldn't need to use __malloc_initialize_hook because the hook functions will not call glibc malloc functions. This patch uses symbol interposing to set the 4 hooks. With this patch, the initial crash of bug 469439 is avoided, but I'm having trouble testing with my setup here. I get a different (slightly later) crash with this patch but I seem to get the same crash without jemalloc, so it may just be related to the hackish way that I've installed NVIDIA's libGL. I'd appreciate if someone can help me by testing this patch, please? You'll need to explicitly remove OBJ-DIR/browser/app/firefox-bin before the build.
i can confirm that without the patch, a build of SeaMonkey built on top of 1.9.2 mozilla-central code crashes at print preview while with only attachment 386469 [details] [diff] [review] applied in addition, print preview works fine. Nice work!
Thanks very much, Robert. This also fixes bug 469439. (I managed to use the correct libnvidia-tls.so.1.)
Attachment #386469 - Flags: review?(jasone)
Blocks: 473629
According to the feedback in https://bugzilla.novell.com/show_bug.cgi?id=503151 your patch fixes the issues we've seen.
I am sorry for my english, but I was sent here from here> http://bugs.archlinux.org/task/15441 I am very weak in programming, not to say that does not know any language. Just wanted to say that I have a problem with the browser when using the macromedia / adobe flash. Ready to share any technical information that will be required. Thanks.
Given that this causes problems with flash in at least some cases (bug 469439), I think we should fix this for 1.9.2 (and 1.9.1.x as well).
blocking1.9.1: --- → ?
Flags: blocking1.9.2?
#1 Firefox 3.5.1 crash on Linux ATM
Keywords: crash, topcrash
From all I hear from the Novell/openSUSE side of things, the patch is used in builds they ship now and users cheer for it as the problems seem to be gone. We really should get this into both 1.9.2 and 1.9.1 ASAP.
Comment on attachment 386469 [details] [diff] [review] hook jemalloc into glibc's malloc (without __malloc_initialize_hook) I don't understand the "elif !defined(malloc) bit here... can you explain the purpose of that clause?
(In reply to comment #14) > #1 Firefox 3.5.1 crash on Linux ATM What is this based on? I don't think it's based on our stats because the highest crash signature has four crashes in the last week...
Comment on attachment 386469 [details] [diff] [review] hook jemalloc into glibc's malloc (without __malloc_initialize_hook) (In reply to comment #17) > I don't understand the "elif !defined(malloc) bit here... can you explain the > purpose of that clause? I saw this code /* Mangle standard interfaces on Darwin and Windows CE, in order to avoid linking problems. */ #if defined(MOZ_MEMORY_DARWIN) #define malloc(a) moz_malloc(a) #define valloc(a) moz_valloc(a) #define calloc(a, b) moz_calloc(a, b) #define realloc(a, b) moz_realloc(a, b) #define free(a) moz_free(a) #endif http://hg.mozilla.org/mozilla-central/annotate/55955ee71c10/memory/jemalloc/jemalloc.c#l6126 and assumed that in some cases jemalloc does not replace the system malloc but is used as an alternative allocator in parallel to the system malloc (used only in cases where mixing of allocate/free implementations can be avoided).
Attachment #386469 - Flags: review?(jasone) → review?(benjamin)
(In reply to comment #20) > (From update of attachment 386469 [details] [diff] [review]) > (In reply to comment #17) > > I don't understand the "elif !defined(malloc) bit here... can you explain the > > purpose of that clause? > > I saw this code > > /* Mangle standard interfaces on Darwin and Windows CE, > in order to avoid linking problems. */ > #if defined(MOZ_MEMORY_DARWIN) > #define malloc(a) moz_malloc(a) > #define valloc(a) moz_valloc(a) > #define calloc(a, b) moz_calloc(a, b) > #define realloc(a, b) moz_realloc(a, b) > #define free(a) moz_free(a) > #endif > > http://hg.mozilla.org/mozilla-central/annotate/55955ee71c10/memory/jemalloc/jemalloc.c#l6126 > > and assumed that in some cases jemalloc does not replace the system malloc but > is used as an alternative allocator in parallel to the system malloc (used > only in cases where mixing of allocate/free implementations can be avoided). on mac they use this zone allocator nonsense, and so malloc calls in to zone[0] basically and does an allocation. free() loops through each zone asking if it owns the allocation and then calls free on that zone. on mac with jemalloc (which we don't actually use at the moment), we setup a zone and replace the default zone with our own, so we need to define our functions as something other than malloc, etc. We still replace the system allocations.
Thank you, Stuart for the explanation. The behavior of this patch is the same as attachment 386469 [details] [diff] [review]. The difference is that preprocessor conditionals are moved around a bit to make it clearer when each section is processed.
Attachment #386469 - Attachment is obsolete: true
Attachment #390399 - Flags: review?(benjamin)
Attachment #386469 - Flags: review?(benjamin)
blocking1.9.1: ? → needed
Depends on: 506845
Attachment #390399 - Flags: review?(benjamin) → review+
Is the patch scheduled for 3.5.2?
(In reply to comment #24) > Is the patch scheduled for 3.5.2? Not currently, no. A patch has not yet baked on trunk and is, therefore, not ready to land on the 1.9.1 branch.
Status: REOPENED → RESOLVED
Closed: 16 years ago16 years ago
Resolution: --- → FIXED
Verified - Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.2a1pre) Gecko/20090730 Minefield/3.6a1pre
Attachment #390399 - Flags: approval1.9.1.3?
Comment on attachment 390399 [details] [diff] [review] hook jemalloc into glibc's malloc v2.1 Approved for 1.9.1.3. a=ss
Attachment #390399 - Flags: approval1.9.1.3? → approval1.9.1.3+
(In reply to comment #33) > http://hg.mozilla.org/releases/mozilla-1.9.1/rev/d919708797fa Hi, I'm from Venezuela and I have this error described here and I see that here is resolved, but I have not much experience in this and I don't know exactly what I should do to fix this problem on my machine, can you help me?
(In reply to comment #35) > A build with the fix can be downloaded from here: > http://ftp.mozilla.org/pub/mozilla.org/firefox/nightly/latest-mozilla-1.9.1/ Thanks, but that build is in english and I think that that version is not an published official version yet, so I can wait when that version to be published because I already can see the videos in fullscreen by disabling the hardware acceleration in the configuration of flash player. Thank you for your help!!
Flags: blocking1.9.2? → blocking1.9.2+
Priority: -- → P1
blocking1.9.1: needed → ---
Blocks: songbird
Is this fixed in 3.5.4?
Should have been fixed in 3.5.3 as noted by the .3-fixed entry in the status1.9.1 field.
blocking-b2g: 2.2r? → ---
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: